CN111460790A - Method and device for determining English place name and common name, translation equipment and storage medium - Google Patents

Method and device for determining English place name and common name, translation equipment and storage medium Download PDF

Info

Publication number
CN111460790A
CN111460790A CN202010234789.2A CN202010234789A CN111460790A CN 111460790 A CN111460790 A CN 111460790A CN 202010234789 A CN202010234789 A CN 202010234789A CN 111460790 A CN111460790 A CN 111460790A
Authority
CN
China
Prior art keywords
words
word
place name
name
english
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010234789.2A
Other languages
Chinese (zh)
Other versions
CN111460790B (en
Inventor
毛曦
马维军
王继周
王春苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Academy of Surveying and Mapping
Original Assignee
Chinese Academy of Surveying and Mapping
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Academy of Surveying and Mapping filed Critical Chinese Academy of Surveying and Mapping
Priority to CN202010234789.2A priority Critical patent/CN111460790B/en
Publication of CN111460790A publication Critical patent/CN111460790A/en
Application granted granted Critical
Publication of CN111460790B publication Critical patent/CN111460790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a method and a device for determining a common name of an English place name, translation equipment and a storage medium, wherein the method comprises the following steps: preprocessing a preset number of English place names to obtain a target place name sample and an initial English place name common name library; reversely storing each word in the target place name sample to obtain a target tree structure; starting searching from the tail of the target tree structure; if the current word is in the initial English place name common name library, the current word is expanded forwards into two words, and if the first frequency of the two words under the condition of one word end is greater than a first preset threshold value, the two words are added into the initial English place name common name library; expanding the two-word tail forward into three words, and adding the three words into an initial English place name common name library if the second frequency of the three words under the given two-word tail condition is greater than a second preset threshold value; and finishing updating the initial English place name and name database until the search is finished, and obtaining the target English place name and name database. The number of place name and common name is increased so as to improve the translation efficiency and accuracy of the place name and common name.

Description

Method and device for determining English place name and common name, translation equipment and storage medium
Technical Field
The invention relates to the technical field of place name translation in geographic information, in particular to a method and a device for determining a common name of an English place name, translation equipment and a storage medium.
Background
Place name translation refers to the translation of an expression of a geographic entity in one language into an expression in another language. Generally speaking, place names are divided into place name common names and place name proper names, and place name common names are general words summarizing the commonality of certain places and play a qualitative role; the place name is a special word which refers to a certain geographic entity and is used for distinguishing similar ground objects, and the place name plays a role in positioning. The foreign language place name Chinese character translation guide (GB/T17693.3-2009) defines the instructive guidelines of the common transliteration of the special names of the place names and the common general interpretation of the common names of the place names.
Automatic place name translation, or machine place name translation, is part of the translation of named entities in machine translation. For realizing the machine translation of the place name, two parts of the common name and the special name are considered, and different translation methods are adopted respectively, so that the common name and the special name must be effectively distinguished. In the process of translation of the common names, establishing the common name library is an effective way for realizing the distinction of the common denominators and the translation of the common names. In the related technology, the common name is insufficient in English place name machine translation, so that the precision and the efficiency of the machine translation are low.
Disclosure of Invention
In view of the above, an english place name and its corresponding name determining method, apparatus, translating device and storage medium are provided to solve the problems of low translation precision and low translation efficiency caused by the limited number of place names in the prior art.
The invention adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a method for determining a common name of an english place, where the method includes:
preprocessing a preset number of English place names to obtain a target place name sample and an initial English place name common name library;
reversely storing each word in the target place name sample based on a prefix tree structure to obtain a target tree structure;
searching from the tail of the target tree structure by using the initial English place name and common name library;
if the current word is in the initial English place name common name library, expanding the current word forwards into two words, calculating first frequency of the two words under the condition of one word tail, and if the first frequency is greater than a first preset threshold, adding the two words into the initial English place name common name library to update the initial English place name common name library;
expanding the tail of the two-word forward into three words, calculating a second frequency of the three words under the given condition of the tail of the two-word, and if the second frequency is greater than a second preset threshold value, adding the three words into the initial English place name common name library so as to update the initial English place name common name library;
and finishing updating the initial English place name and name database until the searching of the words with N words as the word tails is finished, and obtaining a target English place name and name database, wherein N is the number of the words with the most words in the target place name sample.
In a second aspect, an embodiment of the present application provides an apparatus for determining a common name of an english place name, where the apparatus includes:
the system comprises a preprocessing module, a target place name database and an initial English place name database, wherein the preprocessing module is used for preprocessing English place names with preset quantity to obtain a target place name sample and an initial English place name database;
the storage module is used for reversely storing each word in the target place name sample based on a prefix tree structure to obtain a target tree structure;
the searching module is used for searching from the tail part of the target tree structure by utilizing the initial English place name and common name library;
the first updating module is used for expanding the current word into two words forwards when the current word is in the initial English place name common name library, calculating first frequency of the two words under the condition of one word tail, and adding the two words into the initial English place name common name library to update the initial English place name common name library if the first frequency is greater than a first preset threshold;
the second updating module is used for expanding the tail of the two-word forward into three words, calculating a second frequency of the three words under the given condition of the tail of the two-word, and adding the three words into the initial English place name common name library to update the initial English place name common name library if the second frequency is greater than a second preset threshold;
and the target English place name database determining module is used for completing the updating of the initial English place name database until the searching of the words with N words as the word tails is completed, so as to obtain a target English place name database, wherein N is the number of the words with the most words in the target place name sample.
In a third aspect, an embodiment of the present application provides a translation apparatus, including:
a processor, and a memory coupled to the processor;
the memory is used for storing a computer program, and the computer program is at least used for executing the method for determining the common name of the English place name according to the first aspect of the embodiment of the application;
the processor is used for calling and executing the computer program in the memory.
In a fourth aspect, the present application provides a storage medium, where a computer program is stored, and when being executed by a processor, the computer program implements the steps in the method for determining a common name of an english place name according to the first aspect.
According to the technical scheme, firstly, preprocessing is carried out on English place names to obtain target place name samples, and then all words in the target place name samples are reversely stored based on a prefix tree structure to obtain a target tree structure; searching is carried out based on the target tree structure, and the first word, the second word, the third word and the like which meet the preset frequency condition are added into the initial English place name common name library. The problem of not enough common names in the machine translation of english place names is solved, the technical scheme of this application is used, more english place name common names can be discerned to be used for english place name machine translation with the new-found common name, can improve machine translation's precision and efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for determining a common name of an english place name according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a target tree structure for storing place names based on a prefix tree structure according to an embodiment of the present invention;
FIG. 3 is a flowchart of another method for determining a common name of an English-language place name according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for determining a common name of an english place name according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a translation device applicable to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
Examples
Fig. 1 is a flowchart of a method for determining a name of an english place according to an embodiment of the present invention, where the method may be performed by an apparatus for determining a name of an english place according to an embodiment of the present invention, and the apparatus may be implemented in software and/or hardware. Referring to fig. 1, the method may specifically include the following steps:
s101, preprocessing a preset number of English place names to obtain a target place name sample and an initial English place name common name library.
Specifically, a preset number of english names may be used as the samples, where the preset number may be 10 ten thousand english names in kanperla, australia, and the specific number of samples and the sample selection range may be set or changed according to requirements, which is only an example and does not form a specific limitation here. In order to save the calculation amount and improve the accuracy of finding the common names of the English place names, preprocessing is performed on the English place names with the preset number, for example, words of standard English place names which are known originally are removed, and the remaining screened samples are called target place name samples. In addition, according to a certain rule, a target place name sample is processed to obtain an initial english place name database, and then the initial english place name database is updated to obtain a target english place name database determined in the embodiment of the present application, wherein a word in the target english place name database is the english place name determined in the embodiment of the present application.
S102, reversely storing each word in the target place name sample based on the prefix tree structure to obtain a target tree structure.
The prefix tree, also called word search tree and Trie tree, is a tree structure and a variation of the hash tree. Can be applied to statistics, sorting and storing a large number of character strings, so the search engine system is frequently used for text word frequency statistics. The public prefix of the character string is utilized to reduce the query time, so that unnecessary character string comparison is reduced to the maximum extent, and the query efficiency is higher than that of a Hash tree.
Specifically, the target place name sample comprises a plurality of English place names, and each place name is composed of at least one word. In the embodiment of the application, each word in the target place name sample is reversely stored based on the prefix tree structure, so that the target tree structure is obtained.
In a specific example, fig. 2 shows a schematic diagram of a target tree structure for storing place names based on a prefix tree structure. Optionally, each node of the target tree structure stores the current word and the frequency of the word whose word represented by all the father nodes is the end of word. Referring to fig. 2, each node records the frequency count of the word represented by the current word and all its parents as the word end, for example, the School node records the frequency count of the place name with the School as the word end, i.e., the word shape [ X ] School, where [ X ] is a placeholder; and the Public node records the place name frequency number with the Public School as the suffix, namely the shape of [ X ] Public School.
And S103, searching from the tail part of the target tree structure by using the initial English place name and common name library.
Specifically, the determined initial place name common name library is utilized, then, searching is carried out from the words of the tail node of the target tree structure, whether the words taking each node as the tail and the words taking all father nodes of each node as the tail meet the occurrence frequency condition is judged, and if yes, the words taking the current word as the tail can be used as the English place name common name.
And S104, if the current word is in the initial English place name common name library, expanding the current word forward into two words, calculating the first frequency of the two words under the condition of the tail of the first word, and if the first frequency is greater than a first preset threshold, adding the two words into the initial English place name common name library so as to update the initial English place name common name library.
The greedy algorithm means that when solving a problem, the selection which is the best in the current view is always made. That is, rather than considering global optimality, a locally optimal solution in some sense is made. The greedy algorithm can not obtain an overall optimal solution for all problems, and the key is selection of a greedy strategy, and the selected greedy strategy has no after effect, namely, the previous process of a certain state cannot influence the later state and is only related to the current state.
In the embodiment of the application, the place name and the common name of a plurality of words are found based on a greedy algorithm. Specifically, the first word suffix may refer to a word with a School as a suffix, and the corresponding second word suffix may refer to a word with a Public School as a suffix. Optionally, the first frequency of the second word under the condition of the end word of the first word includes: the frequency numbers of the words with the word end expanded forward by the word end are the ratio of the frequency numbers of the words with the word end as the word end to the frequency numbers of the words with the word end as the word end. If the current word is in the example of School, then School is expanded forward to two words, such as Public School, High School or Bonya School, given that School is in the initial English place name common name library, and then the frequency of each word is given under the word suffix condition. For example, f (Public School)/f (School), where f (—) represents the frequency of words with a suffix, and if f (Public School)/f (School) is greater than a first preset threshold, the Public School is considered to be a common name, and is added as a two-word common name to the initial english place name database to update the initial english place name database.
And S105, expanding the tail of the two-word forward into three words, calculating a second frequency of the three words under the given condition of the tail of the two-word, and adding the three words into the initial English place name common name library to update the initial English place name common name library if the second frequency is greater than a second preset threshold.
Specifically, when the last two words of the place name are in the common name library, the two-word is expanded into three words from the end to the front, and common name discovery is performed. Optionally, the ratio of the frequency counts of the words with the two-word end expanded forward and the three words with the two-word end as the end of the word to the frequency counts of the words with the two-word end as the end of the word is obtained. And calculating the frequency of each three-word under the condition of giving the two-word tail, and adding the three words meeting the condition, such as the three words with the second frequency greater than a second frequency threshold value, into the initial English place name common name library in a three-word common name mode so as to update the initial English place name common name library. And if the two-word end is not in the updated initial English place name common name database, taking the last-but-two word and the last-but-three word of the current place name as the two-word end, and finding the three-word common name.
And S106, completing updating of the initial English place name and name database until the searching of the words with N words as the word tails is completed, and obtaining a target English place name and name database, wherein N is the number of the words with the most words in the target place name sample.
Specifically, by analogy, the common names of more words are found until the searching of the words with N words as the word tails is completed, and thus, the common names of the first word, the second word, the third word and the like are all added into the English place name database which is initially constructed until the common names of the N words, so that all newly found place names are obtained, and the target English place name database is obtained. Where N is the number of words with the largest number of words included in the target place name sample, and N may be 5, for example.
Exemplarily, if the current word is not in the initial english place name database, and the next word of the current word is used as the suffix, continuing to judge whether the next word of the current word is in the initial english place name database, if so, expanding the next word of the current word forward into two words, and performing an operation of judging that the corresponding two words are added into the initial english place name database; and finishing updating the initial English place name and common name library until the searching of the words with N words as the word tails is finished, and obtaining the target English place name and common name library.
Specifically, there is also a case that, for example, if the current word is not in the initial english place name common name library, the next word of the current word is used as the end word, and the next word of the current word is continuously judged; and if the next word of the current word is in the initial English place name common name library, expanding the next word of the current word forward into two words, and repeatedly executing the searching steps until the searching of the words with N words as the tails is completed. In a specific example, such as a place name like [ X ] Public School, if the School is not in a single word-common library, then the discovery of common names is performed again forward, starting with Public.
It should be noted that this step is a case in the aforementioned generic name discovery process, and does not occur after S106, and is merely an example.
According to the technical scheme, firstly, preprocessing is carried out on English place names to obtain target place name samples, and then all words in the target place name samples are reversely stored based on a prefix tree structure to obtain a target tree structure; searching is carried out based on the target tree structure, and the first word, the second word, the third word and the like which meet the preset frequency condition are added into the initial English place name common name library. The problem of not enough common names in the machine translation of english place names is solved, the technical scheme of this application is used, more english place name common names can be discerned to be used for english place name machine translation with the new-found common name, can improve machine translation's precision and efficiency.
Fig. 3 is a flowchart of a method for determining a common name of an english place name according to another embodiment of the present invention, which is implemented on the basis of the foregoing embodiment. Referring to fig. 3, the method may specifically include the following steps:
s301, obtaining a preset number of English place names, and using the preset number of English place names as place name samples in a unified case.
Specifically, after a preset number of English place names are obtained, the English place names are firstly unified into a capital case or a small case; then, the method is divided into single English words, so that the repeated common names of the same words in two different types of capital and small cases as the common names can be avoided firstly in the process of finding the common names of the subsequent place names, and the efficiency of identifying the common names can be improved secondly.
S302, comparing each word in the place name sample with a standard English place name common name library.
Specifically, the place name common names which are widely applied and are publicly known are stored in the standard English place name common name library, in the embodiment of the application, all words in the place name sample are compared with the standard English place name common name library, and therefore the words originally belonging to the place name common names can be conveniently filtered, so that the words are not considered in the subsequent common name discovery process, and the efficiency of the common name discovery can be improved.
S303, filtering out words belonging to the common names in the standard English place name common name database in the place name sample to obtain the target place name sample.
Specifically, words belonging to the common names in the standard English place name common name library in the place name sample are filtered, and the obtained sample is called a target place name sample. In the subsequent processing process, the target place name sample is taken as a reference.
S304, counting the frequency of each word in the target place name sample, screening out the words larger than a preset frequency threshold value, and forming an initial English place name common name library.
Specifically, the target place name sample is compared with the place name sample, some words are filtered out, at this time, the frequency of each word in the target place name sample is counted, wherein the frequency may be a ratio of the number of the target word to all words in the target place name sample. And then, screening out words larger than a preset frequency threshold, wherein optionally, the preset frequency threshold can be 0.9, so that all the screened place names form an initial English place name and common name library. The initial English place name common name library is a single word common name library, and the subsequent process of updating the common name library is the process of adding two-word common names, three-word common names and the like into the common name library.
S305, reversely storing each word in the target place name sample based on the prefix tree structure to obtain a target tree structure.
And S306, searching from the tail part of the target tree structure by utilizing the initial English place name and common name library.
S307, if the current word is in the initial English place name common name library, the current word is expanded forwards into two words, the first frequency of the two words under the condition of the tail of the first word is calculated, and if the first frequency is larger than a first preset threshold value, the two words are added into the initial English place name common name library so as to update the initial English place name common name library.
S308, expanding the two-word tail forward into three words, calculating a second frequency of the three words under the given two-word tail condition, and if the second frequency is greater than a second preset threshold, adding the three words into the initial English place name common name library so as to update the initial English place name common name library.
S309, completing updating of the initial English place name and name database until searching of the words with N words as word tails is completed, and obtaining a target English place name and name database, wherein N is the number of the words with the largest words in the target place name sample.
In the embodiment of the application, English place names in preset quantity are obtained, and the English place names in preset quantity are used as place name samples in a unified capital and lowercase mode; comparing each word in the place name sample with a standard English place name common name library; filtering out words belonging to the common names in the standard English place name common name database in the place name sample to obtain a target place name sample; and counting the frequency of each word in the target place name sample, screening out the words larger than a preset frequency threshold value, and forming an initial English place name common name library. In this way, well-known standard common names are filtered out to form an initial English place name common name library applicable to the embodiment of the application, and then, the common names such as two-word common names and three-word common names are added based on the initial English place name common name library so as to realize the discovery of the common names. The calculation amount is reduced, and the efficiency and the accuracy of the common name discovery are improved.
Fig. 4 is a schematic structural diagram of an apparatus for determining a name of a place in english according to an embodiment of the present invention, which is adapted to execute a method for determining a name of a place in english according to an embodiment of the present invention. As shown in fig. 4, the apparatus may specifically include: the system comprises a preprocessing module 401, a storage module 402, a search module 403, a first updating module 404, a second updating module 405 and a target English place name and name bank determining module 406.
The system comprises a preprocessing module 401, a database module and a database module, wherein the preprocessing module 401 is used for preprocessing a preset number of English place names to obtain a target place name sample and an initial English place name database; a storage module 402, configured to reversely store each term in the target place name sample based on the prefix tree structure, so as to obtain a target tree structure; a searching module 403, configured to start searching from the tail of the target tree structure by using the initial english place name and common name library; a first updating module 404, configured to expand a current word into two words forward when the current word is in the initial english place name database, calculate a first frequency of the two words under a condition of a first word end, and add the two words into the initial english place name database if the first frequency is greater than a first preset threshold, so as to update the initial english place name database; a second updating module 405, configured to extend the two-word end forward into three words, calculate a second frequency of the three words under the given two-word end condition, and add the three words into the initial english place name database if the second frequency is greater than a second preset threshold, so as to update the initial english place name database; and the target English place name database determining module 406 is configured to complete updating of the initial English place name database until the search of the words with the N words as the word tails is completed, so as to obtain a target English place name database, where N is the number of words with the largest number of words included in the target place name sample.
According to the technical scheme, firstly, preprocessing is carried out on English place names to obtain target place name samples, and then all words in the target place name samples are reversely stored based on a prefix tree structure to obtain a target tree structure; searching is carried out based on the target tree structure, and the first word, the second word, the third word and the like which meet the preset frequency condition are added into the initial English place name common name library. The problem of not enough common names in the machine translation of english place names is solved, the technical scheme of this application is used, more english place name common names can be discerned to be used for english place name machine translation with the new-found common name, can improve machine translation's precision and efficiency.
Optionally, the system further includes a third updating module, configured to start searching from a tail of the target tree structure by using the initial english place name database, and then, if the current word is not in the initial english place name database, take a next word of the current word as a suffix, continue to determine whether the next word of the current word is in the initial english place name database, if so, extend the next word of the current word forward into two words, and perform an operation of determining that the corresponding two words are added to the initial english place name database; and finishing updating the initial English place name and common name library until the searching of the words with N words as the word tails is finished, and obtaining the target English place name and common name library.
Optionally, the preprocessing module 401 is specifically configured to:
acquiring English place names in preset quantity, and uniformly capitalizing the English place names in preset quantity to be used as place name samples;
comparing each word in the place name sample with a standard English place name common name library;
filtering out words belonging to the common names in the standard English place name common name database in the place name sample to obtain a target place name sample;
and counting the frequency of each word in the target place name sample, screening out the words larger than a preset frequency threshold value, and forming an initial English place name common name library.
Optionally, each node of the target tree structure stores the current word and the frequency of the word whose word represented by all the father nodes is the end of word.
Optionally, the first frequency of the second word under the condition of the end word of the first word includes: the frequency numbers of the words with the word end expanded forward by the word end are the ratio of the frequency numbers of the words with the word end as the word end to the frequency numbers of the words with the word end as the word end.
The device for determining the English place name and the common name provided by the embodiment of the invention can execute the method for determining the English place name and the common name provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
An embodiment of the present invention further provides a translation device, please refer to fig. 5, fig. 5 is a schematic structural diagram of a translation device, and as shown in fig. 5, the translation device includes: a processor 510, and a memory 520 coupled to the processor 510; the memory 520 is used for storing a computer program for executing at least a determination method of a common name of an english place name in the embodiment of the present invention; processor 510 is used to invoke and execute the computer programs in the memory; the method for determining the common name of the English place name at least comprises the following steps: preprocessing a preset number of English place names to obtain a target place name sample and an initial English place name common name library; reversely storing each word in the target place name sample based on the prefix tree structure to obtain a target tree structure; searching from the tail of the target tree structure by using an initial English place name and common name library; if the current word is in the initial English place name common name library, the current word is expanded forwards into two words, the first frequency of the two words under the condition of one word tail is calculated, and if the first frequency is greater than a first preset threshold value, the two words are added into the initial English place name common name library so as to update the initial English place name common name library; expanding the tail of the two-word forward into three words, calculating a second frequency of the three words under the given condition of the tail of the two-word, and if the second frequency is greater than a second preset threshold value, adding the three words into an initial English place name common name library so as to update the initial English place name common name library; and finishing updating the initial English place name and name database until the searching of the words with N words as the word tails is finished, and obtaining a target English place name and name database, wherein N is the number of the words with the most words in the target place name sample.
The embodiment of the present invention further provides a storage medium, where the storage medium stores a computer program, and when the computer program is executed by a processor, the method for determining a name of an english place in the embodiment of the present invention includes: preprocessing a preset number of English place names to obtain a target place name sample and an initial English place name common name library; reversely storing each word in the target place name sample based on the prefix tree structure to obtain a target tree structure; searching from the tail of the target tree structure by using an initial English place name and common name library; if the current word is in the initial English place name common name library, the current word is expanded forwards into two words, the first frequency of the two words under the condition of one word tail is calculated, and if the first frequency is greater than a first preset threshold value, the two words are added into the initial English place name common name library so as to update the initial English place name common name library; expanding the tail of the two-word forward into three words, calculating a second frequency of the three words under the given condition of the tail of the two-word, and if the second frequency is greater than a second preset threshold value, adding the three words into an initial English place name common name library so as to update the initial English place name common name library; and finishing updating the initial English place name and name database until the searching of the words with N words as the word tails is finished, and obtaining a target English place name and name database, wherein N is the number of the words with the most words in the target place name sample.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A method for determining a common name of an English place name is characterized by comprising the following steps:
preprocessing a preset number of English place names to obtain a target place name sample and an initial English place name common name library;
reversely storing each word in the target place name sample based on a prefix tree structure to obtain a target tree structure;
searching from the tail of the target tree structure by using the initial English place name and common name library;
if the current word is in the initial English place name common name library, expanding the current word forwards into two words, calculating first frequency of the two words under the condition of one word tail, and if the first frequency is greater than a first preset threshold, adding the two words into the initial English place name common name library to update the initial English place name common name library;
expanding the tail of the two-word forward into three words, calculating a second frequency of the three words under the given condition of the tail of the two-word, and if the second frequency is greater than a second preset threshold value, adding the three words into the initial English place name common name library so as to update the initial English place name common name library;
and finishing updating the initial English place name and name database until the searching of the words with N words as the word tails is finished, and obtaining a target English place name and name database, wherein N is the number of the words with the most words in the target place name sample.
2. The method of claim 1, wherein the initial english common name database is utilized to start the search from the tail of the target tree structure, and thereafter, further comprising:
if the current word is not in the initial English place name common name library, taking the next word of the current word as a suffix, continuously judging whether the next word of the current word is in the initial English place name common name library, if so, expanding the next word of the current word forward into two words, and executing the operation of judging that the corresponding two words are added into the initial English place name common name library;
and finishing updating the initial English place name and common name library until the searching of the words with N words as the word tails is finished, and obtaining the target English place name and common name library.
3. The method of claim 1, wherein preprocessing a preset number of english place names to obtain a target place name sample and an initial english place name database, comprises:
acquiring English place names in preset quantity, and uniformly capitalizing the English place names in the preset quantity to be used as place name samples;
comparing each word in the place name sample with a standard English place name common name library;
filtering out words belonging to the common names in the standard English place name common name database in the place name sample to obtain a target place name sample;
and counting the frequency of each word in the target place name sample, screening out words larger than a preset frequency threshold value, and forming an initial English place name common name library.
4. The method of claim 1, wherein each node of the target tree structure stores a current word and a frequency of words whose words represented by all parent nodes are endings.
5. The method of claim 1, wherein the first frequency of the second word under the condition of the end of word of the first word comprises: the frequency numbers of the words with the word end expanded forward from the word end are the ratio of the frequency numbers of the words with the word end as the word end to the frequency numbers of the words with the word end as the word end.
6. The method of claim 1, wherein the second frequency of the three words under the two-word end condition comprises:
the frequency numbers of the words with the word tails of the two words expanded forward are the ratio of the frequency numbers of the words with the word tails of the two words to the frequency numbers of the words with the word tails of the two words at present.
7. An apparatus for determining a common name of an English place name, comprising:
the system comprises a preprocessing module, a target place name database and an initial English place name database, wherein the preprocessing module is used for preprocessing English place names with preset quantity to obtain a target place name sample and an initial English place name database;
the storage module is used for reversely storing each word in the target place name sample based on a prefix tree structure to obtain a target tree structure;
the searching module is used for searching from the tail part of the target tree structure by utilizing the initial English place name and common name library;
the first updating module is used for expanding the current word into two words forwards when the current word is in the initial English place name common name library, calculating first frequency of the two words under the condition of one word tail, and adding the two words into the initial English place name common name library to update the initial English place name common name library if the first frequency is greater than a first preset threshold;
the second updating module is used for expanding the tail of the two-word forward into three words, calculating a second frequency of the three words under the given condition of the tail of the two-word, and adding the three words into the initial English place name common name library to update the initial English place name common name library if the second frequency is greater than a second preset threshold;
and the target English place name database determining module is used for completing the updating of the initial English place name database until the searching of the words with N words as the word tails is completed, so as to obtain a target English place name database, wherein N is the number of the words with the most words in the target place name sample.
8. The apparatus according to claim 7, further comprising a third updating module, configured to start a search from a tail of the target tree structure by using the initial english place name database, and then, if the current word is not in the initial english place name database and a next word of the current word is taken as a suffix, continue to determine whether the next word of the current word is in the initial english place name database, if so, expand the next word of the current word forward into two words, and perform an operation of determining that two corresponding english words are added to the initial english place name database;
and finishing updating the initial English place name and common name library until the searching of the words with N words as the word tails is finished, and obtaining the target English place name and common name library.
9. A translation apparatus, comprising:
a processor, and a memory coupled to the processor;
the memory is used for storing a computer program at least for executing the method for determining the common name of English geographical name according to any one of claims 1-6;
the processor is used for calling and executing the computer program in the memory.
10. A storage medium, characterized in that the storage medium stores a computer program, which when executed by a processor, implements the steps of the method for determining a common name of an english name according to any one of claims 1 to 6.
CN202010234789.2A 2020-03-30 2020-03-30 Method and device for determining English place names and names, translation equipment and storage medium Active CN111460790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010234789.2A CN111460790B (en) 2020-03-30 2020-03-30 Method and device for determining English place names and names, translation equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010234789.2A CN111460790B (en) 2020-03-30 2020-03-30 Method and device for determining English place names and names, translation equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111460790A true CN111460790A (en) 2020-07-28
CN111460790B CN111460790B (en) 2023-07-04

Family

ID=71684992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010234789.2A Active CN111460790B (en) 2020-03-30 2020-03-30 Method and device for determining English place names and names, translation equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111460790B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010025415A1 (en) * 2008-08-29 2010-03-04 Alibaba Group Holding Limited Determining core geographical information in a document
CN101876975A (en) * 2009-11-04 2010-11-03 中国科学院声学研究所 Identification method of Chinese place name
US20140101544A1 (en) * 2012-10-08 2014-04-10 Microsoft Corporation Displaying information according to selected entity type
US20140244730A1 (en) * 2013-02-27 2014-08-28 Pavlov Media, Inc. Resolver-based data storage and retrieval system and method
CN107622058A (en) * 2016-07-13 2018-01-23 北京四维图新科技股份有限公司 Make method, apparatus, electronic navigation chip and the server of the foreign language bank of geographical names
CN109710087A (en) * 2018-12-28 2019-05-03 北京金山安全软件有限公司 Input method model generation method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010025415A1 (en) * 2008-08-29 2010-03-04 Alibaba Group Holding Limited Determining core geographical information in a document
CN101876975A (en) * 2009-11-04 2010-11-03 中国科学院声学研究所 Identification method of Chinese place name
US20140101544A1 (en) * 2012-10-08 2014-04-10 Microsoft Corporation Displaying information according to selected entity type
US20140244730A1 (en) * 2013-02-27 2014-08-28 Pavlov Media, Inc. Resolver-based data storage and retrieval system and method
CN107622058A (en) * 2016-07-13 2018-01-23 北京四维图新科技股份有限公司 Make method, apparatus, electronic navigation chip and the server of the foreign language bank of geographical names
CN109710087A (en) * 2018-12-28 2019-05-03 北京金山安全软件有限公司 Input method model generation method and device

Also Published As

Publication number Publication date
CN111460790B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN108897842B (en) Computer readable storage medium and computer system
CN107832407B (en) Information processing method and device for generating knowledge graph and readable storage medium
CN109508339B (en) Data query method and device, terminal equipment and storage medium
CN112115227B (en) Data query method and device, electronic equipment and storage medium
US8364696B2 (en) Efficient incremental parsing of context sensitive programming languages
KR20120123241A (en) Query parsing for map search
KR20090048624A (en) Dynamic fragment mapping
EP3608801A1 (en) Method of rapidly searching element information in a bim model
CN112347767B (en) Text processing method, device and equipment
CN115509694B (en) Transaction processing method, device, electronic equipment and storage medium
CN113468204A (en) Data query method, device, equipment and medium
CN109189343B (en) Metadata disk-dropping method, device, equipment and computer-readable storage medium
CN112970011A (en) Recording pedigrees in query optimization
CN111737541B (en) Semantic recognition and evaluation method supporting multiple languages
CN111460000B (en) Backtracking data query method and system based on relational database
CN103870489A (en) Chinese name self-extension recognition method based on search logs
CN111460790A (en) Method and device for determining English place name and common name, translation equipment and storage medium
CN110704573A (en) Directory storage method and device, computer equipment and storage medium
CN116186041A (en) Data lake index creation method and device, electronic equipment and computer storage medium
CN115129738A (en) Cross-database data writing method, device and equipment
CN115238655A (en) Json data editing method and device
CN113536058A (en) Spatial index modification method, device, equipment and storage medium
CN117272938B (en) Dynamic limited domain decoding method, device and medium for text generation
CN111159218B (en) Data processing method, device and readable storage medium
CN111400320B (en) Method and device for generating information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant