CN107870905A - A kind of recognition methods of specific vocabulary - Google Patents

A kind of recognition methods of specific vocabulary Download PDF

Info

Publication number
CN107870905A
CN107870905A CN201711253593.2A CN201711253593A CN107870905A CN 107870905 A CN107870905 A CN 107870905A CN 201711253593 A CN201711253593 A CN 201711253593A CN 107870905 A CN107870905 A CN 107870905A
Authority
CN
China
Prior art keywords
noun
module
vocabulary
cutting
multiple feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711253593.2A
Other languages
Chinese (zh)
Other versions
CN107870905B (en
Inventor
郑丽华
何征宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Language Network (wuhan) Information Technology Co Ltd
Original Assignee
Language Network (wuhan) Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Language Network (wuhan) Information Technology Co Ltd filed Critical Language Network (wuhan) Information Technology Co Ltd
Priority to CN201711253593.2A priority Critical patent/CN107870905B/en
Publication of CN107870905A publication Critical patent/CN107870905A/en
Application granted granted Critical
Publication of CN107870905B publication Critical patent/CN107870905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses recognition methods, system and the computer-readable medium of the specific vocabulary in a kind of waiting for translating shelves.Using the method and system of the present invention, the most of specific unconventional vocabulary occurred in translation process can be recognized accurately, and methods described can use computer software and/or hardware system to realize that automatic identification exports.The present invention is used in actual translations work, the translation error of related special word can be avoided, improves the accuracy of translation.Furthermore, it is possible to progressively establish unconventional lexicon in translation process, and the content in the storehouse of being enriched constantly by identification process;So as to by the unconventional lexicon of continuous renewal, finally realize the full automatic translation of all waiting for translating sheets including unconventional vocabulary.

Description

A kind of recognition methods of specific vocabulary
Technical field
The invention belongs to vocabulary to identify field, more particularly to a kind of recognition methods of the specific vocabulary in waiting for translating shelves.
Background technology
Through being commonly encountered the issues for translation of some special words in translation.These special words are neither traditional English Cliction is converged, nor traditional Chinese phonetic alphabet vocabulary.When being translated to it, if in accordance with existing conventional translation corpus, These vocabulary are all difficult to find the corresponding translation for meeting the original text meaning.Therefore, either machine translation, or human translation, By the limitation of corpus or the level of translator are limited, all occurs deviation unavoidably.
Example known to one translator is exactly for " Chiang Kai-shek " translation.Famous history religion Award what Wang Qi published in October, 2008《Sino-Russian national boundaries eastern section academic history research:In in China, Russia, the western scholar visual field Russia national boundaries eastern section problem》In one book, Jiang Jieshi's (using the original text of Webster phonetic as Chiang Kai-shek) is translated as " often Triumphant Shen ";It is not unique, but has its counterpart, and " Mencius " was once also translated into " Men Xiusi " by other famous scholars(The original text meaning should be " Meng Son ").It can be seen that the processing in translation for such vocabulary, is even a problem for associated specialist, less with wide Big common translation person and machine translation tools.
Therefore, the translation of this kind of special word is also required to specially treated, it is impossible to using the form that English is translated or even is translated firmly.Due to This kind of special word total amount is relatively fewer, and one kind, which possible solution, is, in translation, first skips this kind of vocabulary, directly protects Stay original text to express, obtain a preliminary translation result, then special word therein is identified so as to post-processing again; Or before translation, special word therein is just identified, the processing such as emphasis mark is carried out, it is wrong to avoid the occurrence of above-mentioned translation By mistake.This special processing mode reduces the translation speed and quality of document, and is carried out exclusively for a small amount of special word Artificial treatment also wastes time and energy.
The content of the invention
In view of the above-mentioned problems, the present invention proposes a kind of recognition methods of special word, this method can be recognized accurately Special word in waiting for translating shelves, to avoid translation error.
Special word mentioned here, it is primarily referred to as neither traditional English word, does not also form the Scheme for the Chinese Phonetic Alphabet Vocabulary.
" tradition " English word described here, refer to word common in conventional language study, for example, the routine in Guangzhou English word is " Guangzhou ", and in other words, also considerable part people understand that " Canton ", but due to historical reasons, Word " Kwangchow ", " Kuang-chou " accurately translation should also be as being " Guangzhou " as place name, still, for major part For people, this 2 words are all the words of " non-traditional ".
Likewise, for " " I Ching " " Chunghwa " are not one and meet the Chinese phonetic alphabet side Mao Tse-tung " The vocabulary of case, falls within special word.
Inventor had found by substantial amounts of NULL, and most of special word is all noun, including place name, name, machine Structure title etc..Therefore, the identification range of special word is limited on noun first, meets real work needs.
Therefore, recognition methods proposed by the invention, comprises the following steps first:
Cutting is carried out to the file to be translated, noun therein is identified, by all nouns identified according to it in institute The sequence of positions in file to be translated is stated to be stored in an ordered list.
On carrying out cutting to file to be translated and identifying noun therein, there are a variety of common algorithms in this area.Example Such as, it is sentence by file cutting first, then by carrying out semantic analysis, including sentence element analysis to sentence, identifies wherein Each structure division, such as SVO etc., then find noun from object part;Or preposition part therein is identified, it is being situated between Other ad-hoc locations outside word identify noun, such as subject etc.;Again or, by analyzing the connection between different words Degree, by Connected degree whether exceed certain threshold value come judge connect words whether be noun or connection words before and after words be No is noun, or directly whether belongs to noun, etc. by dictionary, dictionary, language material library inquiry.It will not be repeated here.
After identifying noun, not all noun is all special word, therefore, can carry out certain pre- place Reason, filters out potential special word, so as to reduce follow-up work amount.
Specifically, following preprocessing means can be taken:
Judge whether the noun includes the Latin alphabet, if do not included, the noun is without storage.
If comprising, continue to judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet, if meeting the Scheme for the Chinese Phonetic Alphabet, The noun is without storage.
The noun in noun sequence table set after above-mentioned pretreatment, all it is potential possible special word, enters Enter and analyze in next step:The noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine that the noun is It is no to belong to specific vocabulary;
Now, the means taken and determination methods of the invention are:Cutting is carried out in units of byte to the noun and obtains multiple spies Levy field;If at least one in the multiple feature field meets predetermined condition, it is determined that the noun belongs to specific vocabulary.
In the present invention, the specific identification method of specific vocabulary is proposed first.First, noun is entered in units of byte Row cutting, it ensure that the maximum accuracy of obtained feature field;Secondly, according to the feature field of byte unit whether Meet predetermined condition, also farthest identify " special " property of the noun.
For the former, multiple feature fields that cutting obtains are carried out in units of byte to the noun, by following multiple words One of section or multiple compositions:The Latin alphabet, space, diacritic, connector.
It is described to meet predetermined condition for the latter, refer at least meet one of following condition:
The multiple feature field includes multiple Latin alphabets, while includes connector;
Affiliated multiple feature fields include multiple Latin alphabets and at least one diacritic, and the diacritic is positioned at least The top or the upper right corner of one Latin alphabet.
By above-mentioned steps, the present invention can at least identify such as " Mao Tse-tung " " Kuang-chou " " Chiang Kai-shek " " Ch'eng T'ien-fang " etc special words.
Signified " diacritic " herein, it focuses on " adding ", and " additional " is it should be appreciated that according to traditional spelling Mode, this symbol should not occur, for example, being typically not in various symbols of supplying gas in english literature(‘)(’), also will not On alphabetical top, either the upper right corner or other positions have additional marking.
Therefore, diacritic of the invention is not limited to the symbol of supplying gas(‘)(’), it is also not necessarily limited to positioned at least one The top of the Latin alphabet or other symbols of the position in the upper right corner, it can also appear in other positions.
Above-mentioned predetermined condition is one of most significant feature of special word.But it still there may be the situation of omission, example Such as, " Kwangchow " being previously mentioned, " I Ching " " Chunghwa ", now then need to determine whether:It is if described more Individual feature field is unsatisfactory for the predetermined condition, then continues with identification step:
Judge whether the multiple feature field includes space;
If not including space, judge whether the character of the multiple feature field composition meets the Scheme for the Chinese Phonetic Alphabet;If It is unsatisfactory for, it is determined that the noun belongs to specific vocabulary;
If comprising space, whether at least one be unsatisfactory for is judged in two characters of the feature field composition before and after the space The Scheme for the Chinese Phonetic Alphabet, if it is, determining that the noun belongs to specific vocabulary.
It can be seen from this standard, " Kwangchow " " Chunghwa " although not including space, composition character is not inconsistent Close the Scheme for the Chinese Phonetic Alphabet;" I Ching " include space, but " Ching " after space is unsatisfactory for the Scheme for the Chinese Phonetic Alphabet, simultaneously Single I can not form phonetic plan.
Therefore, the present invention can continue to identify such special word.
As can be seen that above-mentioned recognition methods proposed by the present invention can be realized automatically by computer program.By above-mentioned Method, most of special word in waiting for translating shelves can be recognized accurately.
In another aspect of the present invention, a kind of specific vocabulary identifying system is additionally provided, for identifying in file to be translated Specific vocabulary, the specific vocabulary includes at least one Latin alphabet;The system includes following module:
Identification module, cutting is carried out to the file to be translated, identifies and exports noun therein;
Pretreatment module, the noun of cutting module output is pre-processed;The pretreatment includes:Judge whether the noun wraps Containing the Latin alphabet;And judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet;
Memory module, the noun after pretreatment module is handled is stored according to its sequence of positions in the file to be translated In an ordered list;
Semantic module, the noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine the name Whether word belongs to specific vocabulary;
Characterized in that, be set forth in semantic module includes byte cutting module, judge module and result output module,
The byte cutting module carries out cutting in units of byte to the noun and obtains multiple feature fields;
The judge module, whether judge in the multiple feature field at least one meets predetermined condition;
The result output module exports the recognition result of vocabulary according to the judge module.
Above-mentioned identifying system can be used for the recognition methods for performing the foregoing proposition of the present invention, and include corresponding function mould Block, realized using computer hardware or software.When being realized using software, can by a kind of computer-readable recording medium, Computer-readable store instruction is stored thereon with, by instruction described in memory and computing device, to realize the above method.
It is pointed out that the specific vocabulary pointed by the present invention, is referred not only to for traditional vocabulary, Er Qieshi For the current degree of awareness of translator.For example, for " Chiang Kai-shek " translation, it is famous to go through When historiography professor Wang Qi is translated, for the degree of awareness at that time, " Chiang Kai-shek " are exactly a present invention " the specific vocabulary " of definition.However, by the passage of cultural wide-scale distribution and time, till now, even for the general of this area For logical technical staff, " a Chiang Kai-shek " also not specific vocabulary at last, but a popular word, Because related translated corpora/translation tool etc., all by " Chiang Kai-shek " correct translation result " Jiang Jie Stone " is stored and preserved.For " Mencius " and in this way, it correctly can be identified and translated into by existing translation " Mencius ".
But as first translation " Chiang Kai-shek "/" Mencius ", due to historical reasons, also very A large amount of similar specific vocabulary are included in more waiting for translating shelves.When such vocabulary is translated for the first time, translator still may Because there is mistake without any reference;Meanwhile existing translated corpora/translation tool also has no idea to predict this in advance Class situation.In light of this situation, still the method for the present invention is relied on constantly to identify specific vocabulary in translation process.
For the specific vocabulary identified, it can be determined that whether accurate translation be present;For example, a spy can be established Determine vocabulary corpus, existing specific vocabulary translation result is preserved;The new specific vocabulary that will identify that simultaneously is continuously added, So as to update the specific vocabulary translated corpora.
Therefore, using the method and system of the present invention, it is specific that the major part occurred in translation process can be recognized accurately Unconventional vocabulary, and methods described can use computer software and/or hardware system realize that automatic identification exports.In reality The present invention is used in the translation of border, the translation error of related special word can be avoided, improves the accuracy of translation.This Outside, unconventional lexicon can be progressively established in translation process, and the content in the storehouse of being enriched constantly by identification process;So as to By the unconventional lexicon of continuous renewal, finally realize that the full-automatic of all waiting for translating sheets including unconventional vocabulary is turned over Translate.
Brief description of the drawings
Fig. 1 is a kind of flow chart of recognition methods of the present invention.
Fig. 2 is the frame diagram of identifying system of the present invention.
Embodiment
Reference picture 1, the recognition methods step of proposition of the invention are as follows:
S1, cutting is carried out to the file to be translated, identifies noun therein;
S2, judges whether current noun includes the Latin alphabet;If do not included, the noun carries out next name without storage Word judges;Otherwise step S3 is entered;
S3:Judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet, if meeting the Scheme for the Chinese Phonetic Alphabet, the noun need not store, Judge otherwise to enter step S4 into next noun:
S4:All nouns identified are stored in into one according to its sequence of positions in the file to be translated sequence In table;
S5:Sequentially read the noun in ordered list;
S6:Cutting is carried out in units of byte to the noun and obtains multiple feature fields;
S7:Whether judge in the multiple feature field at least one meets predetermined condition;If it is, exporting the noun and being Special word;Otherwise, read next noun to continue to judge, until all nouns have been identified and finished in sequence table.
Fig. 1 execution step is only the one of which specific implementation of the method for the invention.In practical implementations, The step S2, step S3 order can exchange;S3 can be moved on to after step S4 and performed in current order, Step S2 can be moved on to after step S4;Likewise, can also be by S2 or S3 after step S7 judged result is no Performing.Performed it will be understood by those skilled in the art that above-mentioned different combination step can be separated or merged, as long as finally Special word can be identified according to predetermined condition.
For example, the method for the present invention can not carry out step S3 judgement at the beginning, and going to step " currently If the multiple feature field is unsatisfactory for the predetermined condition " and then continue with identification step:
Judge whether the multiple feature field includes space;
If not including space, judge whether the character of the multiple feature field composition meets the Scheme for the Chinese Phonetic Alphabet;If It is unsatisfactory for, it is determined that the noun belongs to specific vocabulary;
If comprising space, whether at least one be unsatisfactory for is judged in two characters of the feature field composition before and after the space The Scheme for the Chinese Phonetic Alphabet, if it is, determining that the noun belongs to specific vocabulary.
Fig. 2 then gives the identifying system of the present invention, including following module:
Identification module, cutting is carried out to the file to be translated, identifies and exports noun therein;
Pretreatment module, the noun of cutting module output is pre-processed;The pretreatment includes:Judge whether the noun wraps Containing the Latin alphabet;And judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet;
Memory module, the noun after pretreatment module is handled is stored according to its sequence of positions in the file to be translated In an ordered list;
Semantic module, the noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine the name Whether word belongs to specific vocabulary;
Characterized in that, be set forth in semantic module includes byte cutting module, judge module and result output module,
The byte cutting module carries out cutting in units of byte to the noun and obtains multiple feature fields;
The judge module, whether judge in the multiple feature field at least one meets predetermined condition;
The result output module exports the recognition result of vocabulary according to the judge module.
On the whole, using the method and system of the present invention, the major part occurred in translation process can be recognized accurately Specific unconventional vocabulary, and methods described can use computer software and/or hardware system to realize that automatic identification exports. Using the present invention in real work, it can avoid being similar to the translation error mentioned in background of invention, improve translation The accuracy of work;Furthermore, it is possible to progressively establish unconventional lexicon in translation process, and enriched constantly by identification process The content in the storehouse;So as to be needed by the unconventional lexicon of continuous renewal, final realize including unconventional vocabulary The full automatic translation of translation sheet.

Claims (10)

1. the specific vocabulary recognition methods in a kind of file to be translated, the specific vocabulary includes at least one Latin alphabet, described Recognition methods comprises the following steps:
Cutting is carried out to the file to be translated, noun therein is identified, by all nouns identified according to it in institute The sequence of positions in file to be translated is stated to be stored in an ordered list;
The noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine whether the noun belongs to special Determine vocabulary;
Characterized in that,
The step(2)In, semantic analysis is carried out to the noun to determine whether the noun belongs to specific vocabulary, is specifically included:
(21)Cutting is carried out in units of byte to the noun and obtains multiple feature fields;
(22)If at least one in the multiple feature field meets predetermined condition, it is determined that the noun belongs to specific word Converge.
2. the method as described in claim 1, the step(2)In, what cutting obtained is carried out in units of byte to the noun Multiple feature fields, by one of following multiple fields or multiple form:The Latin alphabet, space, diacritic, connection Symbol.
3. method as claimed in claim 2, described to meet predetermined condition, refer at least meet one of following condition:
(31)The multiple feature field includes multiple Latin alphabets, while includes connector;
(32)Affiliated multiple feature fields include multiple Latin alphabets and at least one diacritic, and the diacritic is located at The top or the upper right corner of at least one Latin alphabet.
4. method as claimed in claim 3, further comprises, if the multiple feature field is unsatisfactory for the predetermined bar Part, then continue with identification step:
(41)Judge whether the multiple feature field includes space;
(42)If not including space, judge whether the character of the multiple feature field composition meets the Scheme for the Chinese Phonetic Alphabet; If it is unsatisfactory for, it is determined that the noun belongs to specific vocabulary;
(43)If comprising space, judge in two characters of the feature field composition before and after the space it is whether at least one not Meet the Scheme for the Chinese Phonetic Alphabet, if it is, determining that the noun belongs to specific vocabulary.
5. the method for claim 1, wherein by all nouns identified according to it in the file to be translated Sequence of positions be stored in an ordered list, in addition to pre-treatment step:Judge whether the noun includes the Latin alphabet, If do not included, the noun is without storage.
6. method as claimed in claim 5, wherein, judge whether the noun includes the Latin alphabet;If comprising continuing Judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet, if meeting the Scheme for the Chinese Phonetic Alphabet, the noun is without storage.
7. a kind of specific vocabulary identifying system, for identifying the specific vocabulary in file to be translated, the specific vocabulary includes at least One Latin alphabet;The system includes following module:
Identification module, cutting is carried out to the file to be translated, identifies and exports noun therein;
Pretreatment module, the noun of cutting module output is pre-processed;The pretreatment includes:Judge whether the noun wraps Containing the Latin alphabet;And judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet;
Memory module, the noun after pretreatment module is handled is stored according to its sequence of positions in the file to be translated In an ordered list;
Semantic module, the noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine the name Whether word belongs to specific vocabulary;
Characterized in that, be set forth in semantic module includes byte cutting module, judge module and result output module,
The byte cutting module carries out cutting in units of byte to the noun and obtains multiple feature fields;
The judge module, whether judge in the multiple feature field at least one meets predetermined condition;
The result output module exports the recognition result of vocabulary according to the judge module.
8. system as claimed in claim 7, the byte cutting module, cutting is carried out in units of byte to the noun and obtained Multiple feature fields, by one of following multiple fields or multiple form:The Latin alphabet, space, diacritic, even Connect symbol.
9. system as claimed in claim 7, described to meet predetermined condition, refer at least meet one of following condition:
(91)The multiple feature field includes multiple Latin alphabets, while includes connector;
(92)Affiliated multiple feature fields include multiple Latin alphabets and at least one diacritic, and the diacritic is located at The top or the upper right corner of at least one Latin alphabet.
10. a kind of computer-readable recording medium, computer-readable store instruction is stored thereon with, passes through memory and processor The instruction is performed, for realizing the method described in claim any one of 1-6.
CN201711253593.2A 2017-12-04 2017-12-04 Method for identifying specific vocabulary Active CN107870905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711253593.2A CN107870905B (en) 2017-12-04 2017-12-04 Method for identifying specific vocabulary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711253593.2A CN107870905B (en) 2017-12-04 2017-12-04 Method for identifying specific vocabulary

Publications (2)

Publication Number Publication Date
CN107870905A true CN107870905A (en) 2018-04-03
CN107870905B CN107870905B (en) 2021-09-17

Family

ID=61755073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711253593.2A Active CN107870905B (en) 2017-12-04 2017-12-04 Method for identifying specific vocabulary

Country Status (1)

Country Link
CN (1) CN107870905B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241543A (en) * 2018-09-19 2019-01-18 传神语联网网络科技股份有限公司 The preconditioning technique of consistency translationese

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030229487A1 (en) * 2002-06-11 2003-12-11 Fuji Xerox Co., Ltd. System for distinguishing names of organizations in Asian writing systems
US20120123766A1 (en) * 2007-03-22 2012-05-17 Konstantin Anisimovich Indicating and Correcting Errors in Machine Translation Systems
CN102708147A (en) * 2012-03-26 2012-10-03 北京新发智信科技有限责任公司 Recognition method for new words of scientific and technical terminology
CN104572625A (en) * 2015-01-21 2015-04-29 北京云知声信息技术有限公司 Recognition method of named entity
CN104572632A (en) * 2014-12-25 2015-04-29 语联网(武汉)信息技术有限公司 Method for determining translation direction of word with proper noun translation
CN106168946A (en) * 2016-06-24 2016-11-30 中国科学院信息工程研究所 A kind of method identifying user initials phenomenon
CN107247708A (en) * 2017-07-03 2017-10-13 中国银行股份有限公司 A kind of Sex criminals method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030229487A1 (en) * 2002-06-11 2003-12-11 Fuji Xerox Co., Ltd. System for distinguishing names of organizations in Asian writing systems
US20120123766A1 (en) * 2007-03-22 2012-05-17 Konstantin Anisimovich Indicating and Correcting Errors in Machine Translation Systems
CN102708147A (en) * 2012-03-26 2012-10-03 北京新发智信科技有限责任公司 Recognition method for new words of scientific and technical terminology
CN104572632A (en) * 2014-12-25 2015-04-29 语联网(武汉)信息技术有限公司 Method for determining translation direction of word with proper noun translation
CN104572625A (en) * 2015-01-21 2015-04-29 北京云知声信息技术有限公司 Recognition method of named entity
CN106168946A (en) * 2016-06-24 2016-11-30 中国科学院信息工程研究所 A kind of method identifying user initials phenomenon
CN107247708A (en) * 2017-07-03 2017-10-13 中国银行股份有限公司 A kind of Sex criminals method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
丁科家: "威妥玛式拼音法与汉语专有名词的翻译", 《英语知识》 *
杨继秋: "从"蒋介石改名了"所想到的", 《贵州省翻译工作者协会2009年会暨学术研讨会》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241543A (en) * 2018-09-19 2019-01-18 传神语联网网络科技股份有限公司 The preconditioning technique of consistency translationese

Also Published As

Publication number Publication date
CN107870905B (en) 2021-09-17

Similar Documents

Publication Publication Date Title
US9916304B2 (en) Method of creating translation corpus
US20070021956A1 (en) Method and apparatus for generating ideographic representations of letter based names
CN106445911B (en) Reference resolution method and system based on micro topic structure
WO2017012327A1 (en) Syntax analysis method and device
CN110019749B (en) Method, apparatus, device and computer readable medium for generating VQA training data
CN101667174A (en) Method and device for improving word alignment quality in multilingual corpus
CN107491441B (en) Method for dynamically extracting translation template based on forced decoding
CN107870905A (en) A kind of recognition methods of specific vocabulary
Alotaiby et al. Arabic vs. English: Comparative statistical study
Che et al. A word segmentation method of ancient Chinese based on word alignment
Béchet et al. CALOR-QUEST: generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations
Marton et al. Transliteration normalization for information extraction and machine translation
JPS59165179A (en) Dictionary look-up system
JP5298834B2 (en) Example sentence matching translation apparatus, program, and phrase translation apparatus including the translation apparatus
CN110674871B (en) Translation-oriented automatic scoring method and automatic scoring system
US10042843B2 (en) Method and system for searching words in documents written in a source language as transcript of words in an origin language
Leng et al. Analysis and research on lexical errors in machine translation in Chinese and Korean translation
Drame et al. Towards a bilingual Alzheimer's disease terminology acquisition using a parallel corpus
Wu et al. Improving statistical word alignment with a rule-based machine translation system
Skadina et al. Towards hybrid neural machine translation for English-Latvian
JPH0343662B2 (en)
JP3752535B2 (en) Translation selection device and translation device
Zhou et al. Blending segmentation with tagging in Chinese language corpus processing
Mahesh et al. Exploring the relevance of bilingual morph-units in automatic induction of translation templates
Lu et al. Korean-Chinese word translation using Chinese character knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant