CN107870905A - A kind of recognition methods of specific vocabulary - Google Patents
A kind of recognition methods of specific vocabulary Download PDFInfo
- Publication number
- CN107870905A CN107870905A CN201711253593.2A CN201711253593A CN107870905A CN 107870905 A CN107870905 A CN 107870905A CN 201711253593 A CN201711253593 A CN 201711253593A CN 107870905 A CN107870905 A CN 107870905A
- Authority
- CN
- China
- Prior art keywords
- noun
- module
- vocabulary
- cutting
- multiple feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses recognition methods, system and the computer-readable medium of the specific vocabulary in a kind of waiting for translating shelves.Using the method and system of the present invention, the most of specific unconventional vocabulary occurred in translation process can be recognized accurately, and methods described can use computer software and/or hardware system to realize that automatic identification exports.The present invention is used in actual translations work, the translation error of related special word can be avoided, improves the accuracy of translation.Furthermore, it is possible to progressively establish unconventional lexicon in translation process, and the content in the storehouse of being enriched constantly by identification process;So as to by the unconventional lexicon of continuous renewal, finally realize the full automatic translation of all waiting for translating sheets including unconventional vocabulary.
Description
Technical field
The invention belongs to vocabulary to identify field, more particularly to a kind of recognition methods of the specific vocabulary in waiting for translating shelves.
Background technology
Through being commonly encountered the issues for translation of some special words in translation.These special words are neither traditional English
Cliction is converged, nor traditional Chinese phonetic alphabet vocabulary.When being translated to it, if in accordance with existing conventional translation corpus,
These vocabulary are all difficult to find the corresponding translation for meeting the original text meaning.Therefore, either machine translation, or human translation,
By the limitation of corpus or the level of translator are limited, all occurs deviation unavoidably.
Example known to one translator is exactly for " Chiang Kai-shek " translation.Famous history religion
Award what Wang Qi published in October, 2008《Sino-Russian national boundaries eastern section academic history research:In in China, Russia, the western scholar visual field
Russia national boundaries eastern section problem》In one book, Jiang Jieshi's (using the original text of Webster phonetic as Chiang Kai-shek) is translated as " often
Triumphant Shen ";It is not unique, but has its counterpart, and " Mencius " was once also translated into " Men Xiusi " by other famous scholars(The original text meaning should be " Meng
Son ").It can be seen that the processing in translation for such vocabulary, is even a problem for associated specialist, less with wide
Big common translation person and machine translation tools.
Therefore, the translation of this kind of special word is also required to specially treated, it is impossible to using the form that English is translated or even is translated firmly.Due to
This kind of special word total amount is relatively fewer, and one kind, which possible solution, is, in translation, first skips this kind of vocabulary, directly protects
Stay original text to express, obtain a preliminary translation result, then special word therein is identified so as to post-processing again;
Or before translation, special word therein is just identified, the processing such as emphasis mark is carried out, it is wrong to avoid the occurrence of above-mentioned translation
By mistake.This special processing mode reduces the translation speed and quality of document, and is carried out exclusively for a small amount of special word
Artificial treatment also wastes time and energy.
The content of the invention
In view of the above-mentioned problems, the present invention proposes a kind of recognition methods of special word, this method can be recognized accurately
Special word in waiting for translating shelves, to avoid translation error.
Special word mentioned here, it is primarily referred to as neither traditional English word, does not also form the Scheme for the Chinese Phonetic Alphabet
Vocabulary.
" tradition " English word described here, refer to word common in conventional language study, for example, the routine in Guangzhou
English word is " Guangzhou ", and in other words, also considerable part people understand that " Canton ", but due to historical reasons,
Word " Kwangchow ", " Kuang-chou " accurately translation should also be as being " Guangzhou " as place name, still, for major part
For people, this 2 words are all the words of " non-traditional ".
Likewise, for " " I Ching " " Chunghwa " are not one and meet the Chinese phonetic alphabet side Mao Tse-tung "
The vocabulary of case, falls within special word.
Inventor had found by substantial amounts of NULL, and most of special word is all noun, including place name, name, machine
Structure title etc..Therefore, the identification range of special word is limited on noun first, meets real work needs.
Therefore, recognition methods proposed by the invention, comprises the following steps first:
Cutting is carried out to the file to be translated, noun therein is identified, by all nouns identified according to it in institute
The sequence of positions in file to be translated is stated to be stored in an ordered list.
On carrying out cutting to file to be translated and identifying noun therein, there are a variety of common algorithms in this area.Example
Such as, it is sentence by file cutting first, then by carrying out semantic analysis, including sentence element analysis to sentence, identifies wherein
Each structure division, such as SVO etc., then find noun from object part;Or preposition part therein is identified, it is being situated between
Other ad-hoc locations outside word identify noun, such as subject etc.;Again or, by analyzing the connection between different words
Degree, by Connected degree whether exceed certain threshold value come judge connect words whether be noun or connection words before and after words be
No is noun, or directly whether belongs to noun, etc. by dictionary, dictionary, language material library inquiry.It will not be repeated here.
After identifying noun, not all noun is all special word, therefore, can carry out certain pre- place
Reason, filters out potential special word, so as to reduce follow-up work amount.
Specifically, following preprocessing means can be taken:
Judge whether the noun includes the Latin alphabet, if do not included, the noun is without storage.
If comprising, continue to judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet, if meeting the Scheme for the Chinese Phonetic Alphabet,
The noun is without storage.
The noun in noun sequence table set after above-mentioned pretreatment, all it is potential possible special word, enters
Enter and analyze in next step:The noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine that the noun is
It is no to belong to specific vocabulary;
Now, the means taken and determination methods of the invention are:Cutting is carried out in units of byte to the noun and obtains multiple spies
Levy field;If at least one in the multiple feature field meets predetermined condition, it is determined that the noun belongs to specific vocabulary.
In the present invention, the specific identification method of specific vocabulary is proposed first.First, noun is entered in units of byte
Row cutting, it ensure that the maximum accuracy of obtained feature field;Secondly, according to the feature field of byte unit whether
Meet predetermined condition, also farthest identify " special " property of the noun.
For the former, multiple feature fields that cutting obtains are carried out in units of byte to the noun, by following multiple words
One of section or multiple compositions:The Latin alphabet, space, diacritic, connector.
It is described to meet predetermined condition for the latter, refer at least meet one of following condition:
The multiple feature field includes multiple Latin alphabets, while includes connector;
Affiliated multiple feature fields include multiple Latin alphabets and at least one diacritic, and the diacritic is positioned at least
The top or the upper right corner of one Latin alphabet.
By above-mentioned steps, the present invention can at least identify such as " Mao Tse-tung " " Kuang-chou " " Chiang
Kai-shek " " Ch'eng T'ien-fang " etc special words.
Signified " diacritic " herein, it focuses on " adding ", and " additional " is it should be appreciated that according to traditional spelling
Mode, this symbol should not occur, for example, being typically not in various symbols of supplying gas in english literature(‘)(’), also will not
On alphabetical top, either the upper right corner or other positions have additional marking.
Therefore, diacritic of the invention is not limited to the symbol of supplying gas(‘)(’), it is also not necessarily limited to positioned at least one
The top of the Latin alphabet or other symbols of the position in the upper right corner, it can also appear in other positions.
Above-mentioned predetermined condition is one of most significant feature of special word.But it still there may be the situation of omission, example
Such as, " Kwangchow " being previously mentioned, " I Ching " " Chunghwa ", now then need to determine whether:It is if described more
Individual feature field is unsatisfactory for the predetermined condition, then continues with identification step:
Judge whether the multiple feature field includes space;
If not including space, judge whether the character of the multiple feature field composition meets the Scheme for the Chinese Phonetic Alphabet;If
It is unsatisfactory for, it is determined that the noun belongs to specific vocabulary;
If comprising space, whether at least one be unsatisfactory for is judged in two characters of the feature field composition before and after the space
The Scheme for the Chinese Phonetic Alphabet, if it is, determining that the noun belongs to specific vocabulary.
It can be seen from this standard, " Kwangchow " " Chunghwa " although not including space, composition character is not inconsistent
Close the Scheme for the Chinese Phonetic Alphabet;" I Ching " include space, but " Ching " after space is unsatisfactory for the Scheme for the Chinese Phonetic Alphabet, simultaneously
Single I can not form phonetic plan.
Therefore, the present invention can continue to identify such special word.
As can be seen that above-mentioned recognition methods proposed by the present invention can be realized automatically by computer program.By above-mentioned
Method, most of special word in waiting for translating shelves can be recognized accurately.
In another aspect of the present invention, a kind of specific vocabulary identifying system is additionally provided, for identifying in file to be translated
Specific vocabulary, the specific vocabulary includes at least one Latin alphabet;The system includes following module:
Identification module, cutting is carried out to the file to be translated, identifies and exports noun therein;
Pretreatment module, the noun of cutting module output is pre-processed;The pretreatment includes:Judge whether the noun wraps
Containing the Latin alphabet;And judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet;
Memory module, the noun after pretreatment module is handled is stored according to its sequence of positions in the file to be translated
In an ordered list;
Semantic module, the noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine the name
Whether word belongs to specific vocabulary;
Characterized in that, be set forth in semantic module includes byte cutting module, judge module and result output module,
The byte cutting module carries out cutting in units of byte to the noun and obtains multiple feature fields;
The judge module, whether judge in the multiple feature field at least one meets predetermined condition;
The result output module exports the recognition result of vocabulary according to the judge module.
Above-mentioned identifying system can be used for the recognition methods for performing the foregoing proposition of the present invention, and include corresponding function mould
Block, realized using computer hardware or software.When being realized using software, can by a kind of computer-readable recording medium,
Computer-readable store instruction is stored thereon with, by instruction described in memory and computing device, to realize the above method.
It is pointed out that the specific vocabulary pointed by the present invention, is referred not only to for traditional vocabulary, Er Qieshi
For the current degree of awareness of translator.For example, for " Chiang Kai-shek " translation, it is famous to go through
When historiography professor Wang Qi is translated, for the degree of awareness at that time, " Chiang Kai-shek " are exactly a present invention
" the specific vocabulary " of definition.However, by the passage of cultural wide-scale distribution and time, till now, even for the general of this area
For logical technical staff, " a Chiang Kai-shek " also not specific vocabulary at last, but a popular word,
Because related translated corpora/translation tool etc., all by " Chiang Kai-shek " correct translation result " Jiang Jie
Stone " is stored and preserved.For " Mencius " and in this way, it correctly can be identified and translated into by existing translation
" Mencius ".
But as first translation " Chiang Kai-shek "/" Mencius ", due to historical reasons, also very
A large amount of similar specific vocabulary are included in more waiting for translating shelves.When such vocabulary is translated for the first time, translator still may
Because there is mistake without any reference;Meanwhile existing translated corpora/translation tool also has no idea to predict this in advance
Class situation.In light of this situation, still the method for the present invention is relied on constantly to identify specific vocabulary in translation process.
For the specific vocabulary identified, it can be determined that whether accurate translation be present;For example, a spy can be established
Determine vocabulary corpus, existing specific vocabulary translation result is preserved;The new specific vocabulary that will identify that simultaneously is continuously added,
So as to update the specific vocabulary translated corpora.
Therefore, using the method and system of the present invention, it is specific that the major part occurred in translation process can be recognized accurately
Unconventional vocabulary, and methods described can use computer software and/or hardware system realize that automatic identification exports.In reality
The present invention is used in the translation of border, the translation error of related special word can be avoided, improves the accuracy of translation.This
Outside, unconventional lexicon can be progressively established in translation process, and the content in the storehouse of being enriched constantly by identification process;So as to
By the unconventional lexicon of continuous renewal, finally realize that the full-automatic of all waiting for translating sheets including unconventional vocabulary is turned over
Translate.
Brief description of the drawings
Fig. 1 is a kind of flow chart of recognition methods of the present invention.
Fig. 2 is the frame diagram of identifying system of the present invention.
Embodiment
Reference picture 1, the recognition methods step of proposition of the invention are as follows:
S1, cutting is carried out to the file to be translated, identifies noun therein;
S2, judges whether current noun includes the Latin alphabet;If do not included, the noun carries out next name without storage
Word judges;Otherwise step S3 is entered;
S3:Judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet, if meeting the Scheme for the Chinese Phonetic Alphabet, the noun need not store,
Judge otherwise to enter step S4 into next noun:
S4:All nouns identified are stored in into one according to its sequence of positions in the file to be translated sequence
In table;
S5:Sequentially read the noun in ordered list;
S6:Cutting is carried out in units of byte to the noun and obtains multiple feature fields;
S7:Whether judge in the multiple feature field at least one meets predetermined condition;If it is, exporting the noun and being
Special word;Otherwise, read next noun to continue to judge, until all nouns have been identified and finished in sequence table.
Fig. 1 execution step is only the one of which specific implementation of the method for the invention.In practical implementations,
The step S2, step S3 order can exchange;S3 can be moved on to after step S4 and performed in current order,
Step S2 can be moved on to after step S4;Likewise, can also be by S2 or S3 after step S7 judged result is no
Performing.Performed it will be understood by those skilled in the art that above-mentioned different combination step can be separated or merged, as long as finally
Special word can be identified according to predetermined condition.
For example, the method for the present invention can not carry out step S3 judgement at the beginning, and going to step " currently
If the multiple feature field is unsatisfactory for the predetermined condition " and then continue with identification step:
Judge whether the multiple feature field includes space;
If not including space, judge whether the character of the multiple feature field composition meets the Scheme for the Chinese Phonetic Alphabet;If
It is unsatisfactory for, it is determined that the noun belongs to specific vocabulary;
If comprising space, whether at least one be unsatisfactory for is judged in two characters of the feature field composition before and after the space
The Scheme for the Chinese Phonetic Alphabet, if it is, determining that the noun belongs to specific vocabulary.
Fig. 2 then gives the identifying system of the present invention, including following module:
Identification module, cutting is carried out to the file to be translated, identifies and exports noun therein;
Pretreatment module, the noun of cutting module output is pre-processed;The pretreatment includes:Judge whether the noun wraps
Containing the Latin alphabet;And judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet;
Memory module, the noun after pretreatment module is handled is stored according to its sequence of positions in the file to be translated
In an ordered list;
Semantic module, the noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine the name
Whether word belongs to specific vocabulary;
Characterized in that, be set forth in semantic module includes byte cutting module, judge module and result output module,
The byte cutting module carries out cutting in units of byte to the noun and obtains multiple feature fields;
The judge module, whether judge in the multiple feature field at least one meets predetermined condition;
The result output module exports the recognition result of vocabulary according to the judge module.
On the whole, using the method and system of the present invention, the major part occurred in translation process can be recognized accurately
Specific unconventional vocabulary, and methods described can use computer software and/or hardware system to realize that automatic identification exports.
Using the present invention in real work, it can avoid being similar to the translation error mentioned in background of invention, improve translation
The accuracy of work;Furthermore, it is possible to progressively establish unconventional lexicon in translation process, and enriched constantly by identification process
The content in the storehouse;So as to be needed by the unconventional lexicon of continuous renewal, final realize including unconventional vocabulary
The full automatic translation of translation sheet.
Claims (10)
1. the specific vocabulary recognition methods in a kind of file to be translated, the specific vocabulary includes at least one Latin alphabet, described
Recognition methods comprises the following steps:
Cutting is carried out to the file to be translated, noun therein is identified, by all nouns identified according to it in institute
The sequence of positions in file to be translated is stated to be stored in an ordered list;
The noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine whether the noun belongs to special
Determine vocabulary;
Characterized in that,
The step(2)In, semantic analysis is carried out to the noun to determine whether the noun belongs to specific vocabulary, is specifically included:
(21)Cutting is carried out in units of byte to the noun and obtains multiple feature fields;
(22)If at least one in the multiple feature field meets predetermined condition, it is determined that the noun belongs to specific word
Converge.
2. the method as described in claim 1, the step(2)In, what cutting obtained is carried out in units of byte to the noun
Multiple feature fields, by one of following multiple fields or multiple form:The Latin alphabet, space, diacritic, connection
Symbol.
3. method as claimed in claim 2, described to meet predetermined condition, refer at least meet one of following condition:
(31)The multiple feature field includes multiple Latin alphabets, while includes connector;
(32)Affiliated multiple feature fields include multiple Latin alphabets and at least one diacritic, and the diacritic is located at
The top or the upper right corner of at least one Latin alphabet.
4. method as claimed in claim 3, further comprises, if the multiple feature field is unsatisfactory for the predetermined bar
Part, then continue with identification step:
(41)Judge whether the multiple feature field includes space;
(42)If not including space, judge whether the character of the multiple feature field composition meets the Scheme for the Chinese Phonetic Alphabet;
If it is unsatisfactory for, it is determined that the noun belongs to specific vocabulary;
(43)If comprising space, judge in two characters of the feature field composition before and after the space it is whether at least one not
Meet the Scheme for the Chinese Phonetic Alphabet, if it is, determining that the noun belongs to specific vocabulary.
5. the method for claim 1, wherein by all nouns identified according to it in the file to be translated
Sequence of positions be stored in an ordered list, in addition to pre-treatment step:Judge whether the noun includes the Latin alphabet,
If do not included, the noun is without storage.
6. method as claimed in claim 5, wherein, judge whether the noun includes the Latin alphabet;If comprising continuing
Judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet, if meeting the Scheme for the Chinese Phonetic Alphabet, the noun is without storage.
7. a kind of specific vocabulary identifying system, for identifying the specific vocabulary in file to be translated, the specific vocabulary includes at least
One Latin alphabet;The system includes following module:
Identification module, cutting is carried out to the file to be translated, identifies and exports noun therein;
Pretreatment module, the noun of cutting module output is pre-processed;The pretreatment includes:Judge whether the noun wraps
Containing the Latin alphabet;And judge whether the noun meets the Scheme for the Chinese Phonetic Alphabet;
Memory module, the noun after pretreatment module is handled is stored according to its sequence of positions in the file to be translated
In an ordered list;
Semantic module, the noun being successively read in the ordered list, semantic analysis is carried out to the noun, to determine the name
Whether word belongs to specific vocabulary;
Characterized in that, be set forth in semantic module includes byte cutting module, judge module and result output module,
The byte cutting module carries out cutting in units of byte to the noun and obtains multiple feature fields;
The judge module, whether judge in the multiple feature field at least one meets predetermined condition;
The result output module exports the recognition result of vocabulary according to the judge module.
8. system as claimed in claim 7, the byte cutting module, cutting is carried out in units of byte to the noun and obtained
Multiple feature fields, by one of following multiple fields or multiple form:The Latin alphabet, space, diacritic, even
Connect symbol.
9. system as claimed in claim 7, described to meet predetermined condition, refer at least meet one of following condition:
(91)The multiple feature field includes multiple Latin alphabets, while includes connector;
(92)Affiliated multiple feature fields include multiple Latin alphabets and at least one diacritic, and the diacritic is located at
The top or the upper right corner of at least one Latin alphabet.
10. a kind of computer-readable recording medium, computer-readable store instruction is stored thereon with, passes through memory and processor
The instruction is performed, for realizing the method described in claim any one of 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711253593.2A CN107870905B (en) | 2017-12-04 | 2017-12-04 | Method for identifying specific vocabulary |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711253593.2A CN107870905B (en) | 2017-12-04 | 2017-12-04 | Method for identifying specific vocabulary |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107870905A true CN107870905A (en) | 2018-04-03 |
CN107870905B CN107870905B (en) | 2021-09-17 |
Family
ID=61755073
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711253593.2A Active CN107870905B (en) | 2017-12-04 | 2017-12-04 | Method for identifying specific vocabulary |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107870905B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241543A (en) * | 2018-09-19 | 2019-01-18 | 传神语联网网络科技股份有限公司 | The preconditioning technique of consistency translationese |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030229487A1 (en) * | 2002-06-11 | 2003-12-11 | Fuji Xerox Co., Ltd. | System for distinguishing names of organizations in Asian writing systems |
US20120123766A1 (en) * | 2007-03-22 | 2012-05-17 | Konstantin Anisimovich | Indicating and Correcting Errors in Machine Translation Systems |
CN102708147A (en) * | 2012-03-26 | 2012-10-03 | 北京新发智信科技有限责任公司 | Recognition method for new words of scientific and technical terminology |
CN104572625A (en) * | 2015-01-21 | 2015-04-29 | 北京云知声信息技术有限公司 | Recognition method of named entity |
CN104572632A (en) * | 2014-12-25 | 2015-04-29 | 语联网(武汉)信息技术有限公司 | Method for determining translation direction of word with proper noun translation |
CN106168946A (en) * | 2016-06-24 | 2016-11-30 | 中国科学院信息工程研究所 | A kind of method identifying user initials phenomenon |
CN107247708A (en) * | 2017-07-03 | 2017-10-13 | 中国银行股份有限公司 | A kind of Sex criminals method and system |
-
2017
- 2017-12-04 CN CN201711253593.2A patent/CN107870905B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030229487A1 (en) * | 2002-06-11 | 2003-12-11 | Fuji Xerox Co., Ltd. | System for distinguishing names of organizations in Asian writing systems |
US20120123766A1 (en) * | 2007-03-22 | 2012-05-17 | Konstantin Anisimovich | Indicating and Correcting Errors in Machine Translation Systems |
CN102708147A (en) * | 2012-03-26 | 2012-10-03 | 北京新发智信科技有限责任公司 | Recognition method for new words of scientific and technical terminology |
CN104572632A (en) * | 2014-12-25 | 2015-04-29 | 语联网(武汉)信息技术有限公司 | Method for determining translation direction of word with proper noun translation |
CN104572625A (en) * | 2015-01-21 | 2015-04-29 | 北京云知声信息技术有限公司 | Recognition method of named entity |
CN106168946A (en) * | 2016-06-24 | 2016-11-30 | 中国科学院信息工程研究所 | A kind of method identifying user initials phenomenon |
CN107247708A (en) * | 2017-07-03 | 2017-10-13 | 中国银行股份有限公司 | A kind of Sex criminals method and system |
Non-Patent Citations (2)
Title |
---|
丁科家: "威妥玛式拼音法与汉语专有名词的翻译", 《英语知识》 * |
杨继秋: "从"蒋介石改名了"所想到的", 《贵州省翻译工作者协会2009年会暨学术研讨会》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241543A (en) * | 2018-09-19 | 2019-01-18 | 传神语联网网络科技股份有限公司 | The preconditioning technique of consistency translationese |
Also Published As
Publication number | Publication date |
---|---|
CN107870905B (en) | 2021-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9916304B2 (en) | Method of creating translation corpus | |
US20070021956A1 (en) | Method and apparatus for generating ideographic representations of letter based names | |
WO2017012327A1 (en) | Syntax analysis method and device | |
CN110019749B (en) | Method, apparatus, device and computer readable medium for generating VQA training data | |
CN101667174A (en) | Method and device for improving word alignment quality in multilingual corpus | |
CN107491441B (en) | Method for dynamically extracting translation template based on forced decoding | |
Magistry et al. | Can MDL Improve Unsupervised Chinese Word Segmentation? | |
CN107870905A (en) | A kind of recognition methods of specific vocabulary | |
Che et al. | A word segmentation method of ancient Chinese based on word alignment | |
Béchet et al. | CALOR-QUEST: generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations | |
Marton et al. | Transliteration normalization for information extraction and machine translation | |
JPS59165179A (en) | Dictionary look-up system | |
CN110674871B (en) | Translation-oriented automatic scoring method and automatic scoring system | |
US10042843B2 (en) | Method and system for searching words in documents written in a source language as transcript of words in an origin language | |
Leng et al. | Analysis and research on lexical errors in machine translation in Chinese and Korean translation | |
Drame et al. | Towards a bilingual Alzheimer's disease terminology acquisition using a parallel corpus | |
Wu et al. | Improving statistical word alignment with a rule-based machine translation system | |
Skadina et al. | Towards hybrid neural machine translation for English-Latvian | |
JPH0343662B2 (en) | ||
JP3752535B2 (en) | Translation selection device and translation device | |
Zhou et al. | Blending segmentation with tagging in Chinese language corpus processing | |
Mahesh et al. | Exploring the relevance of bilingual morph-units in automatic induction of translation templates | |
CN106708811A (en) | Data processing method and data processing device | |
Lu et al. | Korean-Chinese word translation using Chinese character knowledge | |
Yang et al. | Deduction of translation relations between new short sentences in Chinese and Japanese using analogical associations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |