CN107870905B - Method for identifying specific vocabulary - Google Patents

Method for identifying specific vocabulary Download PDF

Info

Publication number
CN107870905B
CN107870905B CN201711253593.2A CN201711253593A CN107870905B CN 107870905 B CN107870905 B CN 107870905B CN 201711253593 A CN201711253593 A CN 201711253593A CN 107870905 B CN107870905 B CN 107870905B
Authority
CN
China
Prior art keywords
noun
nouns
characteristic fields
module
translated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711253593.2A
Other languages
Chinese (zh)
Other versions
CN107870905A (en
Inventor
郑丽华
何征宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iol Wuhan Information Technology Co ltd
Original Assignee
Iol Wuhan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iol Wuhan Information Technology Co ltd filed Critical Iol Wuhan Information Technology Co ltd
Priority to CN201711253593.2A priority Critical patent/CN107870905B/en
Publication of CN107870905A publication Critical patent/CN107870905A/en
Application granted granted Critical
Publication of CN107870905B publication Critical patent/CN107870905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method, a system and a computer readable medium for identifying specific words in a document to be translated. By adopting the method and the system, most of specific unconventional words appearing in the translation process can be accurately recognized, and the method can realize automatic recognition output by adopting computer software and/or a hardware system. The invention can avoid the translation error of the related special vocabulary and improve the accuracy of the translation work. In addition, an unconventional vocabulary library can be built step by step in the translation process, and the content of the library is enriched continuously through the recognition process; therefore, through the constantly updated unconventional vocabulary library, the full-automatic translation of all texts to be translated including the unconventional vocabularies is finally realized.

Description

Method for identifying specific vocabulary
Technical Field
The invention belongs to the field of vocabulary recognition, and particularly relates to a method for recognizing a specific vocabulary in a document to be translated.
Background
Some special vocabulary translation problems are often encountered in the translation work. These special words are neither traditional english words nor traditional chinese pinyin words. When translating the words, if the words conform to the existing traditional translation corpus, the corresponding translations which conform to the meaning of the original text are difficult to find out. Therefore, whether it is a machine translation or a manual translation, it is inevitable that a bias occurs due to the limitation of the corpus or the level of the translator.
An example of translation is well known to the human translator, the translation for "Chiang Kai-seek". Study on eastern academic history of the kingdom of china, published by professor of famous history in 10 months of 2008: in the book of the eastern problems of the middle and Russian kingdom in the visual fields of scholars in China, Russia and Western, Jiang stone (the original text adopting the Pinyin of Wei is Chiang Kai-shek) is translated into 'Chankaisan'; the "Mencius" was also once translated by other famous scholars into "menaus" (the original meaning should be "mengzi"). It can be seen that the processing of such words in the translation work is a difficult problem for relevant experts, and the vast ordinary translation workers and machine translation tools are not used.
Therefore, the translation of such special words also requires special processing, and cannot be in the form of English translation or even hard translation. Because the total amount of the special words is relatively less, one possible solution is that during translation, the words are skipped first, the original text expression is directly reserved, a primary translation result is obtained, and then the special words are identified for later processing; or, before the translation, the special vocabulary is identified, and the processing such as emphasis labeling is carried out, so that the translation error is avoided. The special processing mode reduces the translation speed and the translation quality of the document, and the manual processing special for a small number of special words is time-consuming and labor-consuming.
Disclosure of Invention
Aiming at the problems, the invention provides a special vocabulary identification method, which can accurately identify the special vocabulary in the document to be translated so as to avoid translation errors.
The special vocabulary is mainly the vocabulary which is neither the traditional English word nor the Chinese pinyin scheme.
The "traditional" english word mentioned here refers to a word that is commonly used in conventional language learning, for example, the conventional english word in Guangzhou is "Guangzhou", or a considerable part of people can know "Canton", but for historical reasons, the words "Kwangchow", "Kuang-chou" should also be "Guangzhou" as a translation with accurate place name, but for most people, these 2 words are "non-traditional" words.
Similarly, neither Mao Tse-tung, I Ching or Chunghwa is a vocabulary meeting the Chinese phonetic alphabet scheme, but also belongs to special vocabularies.
Through a large amount of statistical research, the inventor finds that most special words are nouns, including place names, person names, organization names and the like. Therefore, the recognition range of the special words is firstly limited to nouns, which meets the actual working requirements.
Therefore, the identification method provided by the invention comprises the following steps:
the file to be translated is segmented, nouns in the file to be translated are identified, and all the identified nouns are stored in an ordered list according to the position sequence of the nouns in the file to be translated.
There are many common algorithms in the art for segmenting a document to be translated and identifying nouns therein. For example, a document is firstly divided into sentences, then semantic analysis is carried out on the sentences, including sentence component analysis, each structural part, such as a main and a predicate, is identified, and then nouns are searched from the object parts; or identifying a preposition part in the preposition, and identifying nouns at other specific positions besides the preposition, such as a subject and the like; or, by analyzing the connectivity between different words, whether the connected word is a noun or whether the words before and after the connected word are nouns is judged by judging whether the connectivity exceeds a certain threshold, or whether the words belong to the noun is directly inquired through a dictionary, a word bank and a corpus, and the like. And will not be described in detail herein.
After the nouns are recognized, not all nouns are special words, therefore, certain preprocessing can be carried out to screen out potential special words, and the subsequent workload is reduced.
Specifically, the following preprocessing means may be adopted:
and judging whether the noun contains Latin letters or not, and if not, storing the noun without storage.
If yes, whether the noun accords with the Chinese pinyin scheme or not is continuously judged, and if the noun accords with the Chinese pinyin scheme, the noun does not need to be stored.
The terms in the term sequence list set after the preprocessing are all potential possible special words, and the next analysis is carried out: reading nouns in the ordered list in sequence, and performing semantic analysis on the nouns to determine whether the nouns belong to a specific vocabulary;
in this case, the means and the judgment method adopted by the present invention are: the noun is segmented by taking bytes as a unit to obtain a plurality of characteristic fields; determining that the noun belongs to a particular vocabulary if at least one of the plurality of feature fields satisfies a predetermined condition.
In the invention, a specific recognition mode of a specific word is firstly provided. Firstly, the nouns are segmented by taking bytes as units, so that the maximum accuracy of the obtained characteristic fields is ensured; secondly, the "specificity" of the noun is also recognized to the greatest extent according to whether the characteristic field of the byte unit satisfies the predetermined condition.
For the former, the plurality of feature fields obtained by segmenting the noun in units of bytes are composed of one or more of the following fields: latin letters, spaces, diacritics, and connectors.
For the latter, the satisfaction of the predetermined condition means that at least one of the following conditions is satisfied:
the plurality of characteristic fields comprise a plurality of Latin letters and connectors;
the plurality of characteristic fields comprise a plurality of Latin letters and at least one additional symbol, and the additional symbol is positioned at the upper part or the upper right corner of at least one Latin letter.
Through the steps, the invention can at least identify special words such as 'Mao Tse-tung', 'Kuang-chou', 'Chiang Kai-shek', 'Ch' eng T 'ien-fang'.
The term "additional symbol" as used herein, with emphasis on "additional" should be understood as meaning that such symbol should not appear in the traditional manner of spelling, for example, various aspirated symbols (') (') would not normally appear in english literature, nor would there be additional indicia on the top or upper right corner of the letter or elsewhere.
Therefore, the additional symbols of the present invention are not limited to the air supply symbol (') (') and are not limited to other symbols located at the upper or upper right corner of at least one latin character, and may be present at other positions.
The predetermined condition is one of the most prominent features of a particular vocabulary. However, there may be missing cases, such as the aforementioned "Kwangchow", "I Ching", "Chunghwa", where further determination is needed: if none of the plurality of characteristic fields meets the predetermined condition, continuing the following identification steps:
judging whether the characteristic fields contain spaces or not;
if the Chinese character does not contain a blank space, judging whether the character formed by the plurality of characteristic fields meets a Chinese pinyin scheme; if not, determining that the noun belongs to a specific vocabulary;
if the space is contained, judging whether at least one of two characters formed by the characteristic fields before and after the space does not meet the Chinese pinyin scheme, and if so, determining that the noun belongs to a specific vocabulary.
According to the standard, although the Kwangchow and Chunghwa do not contain blank spaces, the composition characters do not conform to the Chinese pinyin scheme; the 'I Ching' contains a space, but the 'Ching' after the space does not satisfy the Chinese pinyin scheme, and meanwhile, the independent I cannot form the pinyin scheme.
Thus, the present invention can continue to recognize such special words.
It can be seen that the above recognition method proposed by the present invention can be automatically implemented by a computer program. By the method, most of special words in the document to be translated can be accurately identified.
In another aspect of the invention, a specific vocabulary recognition system is also provided for recognizing a specific vocabulary in a to-be-translated file, wherein the specific vocabulary comprises at least one Latin letter; the system comprises the following modules:
the recognition module is used for segmenting the to-be-translated file, recognizing and outputting nouns in the to-be-translated file;
the preprocessing module is used for preprocessing the nouns output by the dividing module; the pretreatment comprises the following steps: judging whether the noun contains Latin letters or not; and judging whether the noun accords with a Chinese pinyin scheme;
the storage module stores the nouns processed by the preprocessing module in an ordered list according to the position sequence of the nouns in the file to be translated;
the semantic analysis module is used for reading the nouns in the ordered list in sequence and carrying out semantic analysis on the nouns so as to determine whether the nouns belong to a specific vocabulary or not;
it is characterized in that the semantic analysis module comprises a byte segmentation module, a judgment module and a result output module,
the byte segmentation module segments the noun by taking bytes as a unit to obtain a plurality of characteristic fields;
the judging module is used for judging whether at least one of the characteristic fields meets a preset condition;
and the result output module outputs the recognition result of the vocabulary according to the judgment module.
The recognition system can be used for executing the recognition method provided by the invention, and comprises corresponding functional modules, and is realized by adopting computer hardware or software. When implemented in software, the above-described method may be implemented by a computer-readable storage medium having computer-readable storage instructions stored thereon which are executed by a memory and a processor.
It should be noted that the specific words and phrases used herein are intended to refer not only to the traditional words and phrases, but also to the current level of knowledge of the translator. For example, for the translation of "Chiang Kai-shek", the famous historical professor King of the translation is "Chiang Kai-shek" defined as a specific vocabulary in the present invention based on the cognitive level at that time. However, through the widespread culture propagation and the passage of time, at present, even for a person skilled in the art, the "Chiang Kai-seek" is not a specific word, but a common word, and because the relevant translation corpus/translation tool and the like store and store the correct translation result of the "Chiang Kai-seek" as the "Jiangshi stone". The same is true for "Mencius", and existing translation work is able to correctly recognize and translate it into "bangs".
However, just as with the first translation "Chiang Kai-shek"/"Mencius", there are also many documents to be translated that contain a large number of similar specific words for historical reasons. When such a vocabulary is translated for the first time, the translator may still be in error because there is no reference; meanwhile, the existing translation corpus/translation tool has no way to predict such a situation in advance. In view of this situation, the method of the present invention is still relied on to recognize specific words continuously during the translation process.
For the identified specific vocabulary, whether accurate translation exists can be judged; for example, a specific vocabulary corpus can be established, and the existing specific vocabulary translation results are saved; and meanwhile, new specific vocabulary is continuously added, so that the specific vocabulary translation corpus is updated.
Therefore, by adopting the method and the system, most of specific irregular words appearing in the translation process can be accurately recognized, and the method can realize automatic recognition output by adopting a computer software and/or hardware system. The invention can avoid the translation error of the related special vocabulary and improve the accuracy of the translation work. In addition, an unconventional vocabulary library can be built step by step in the translation process, and the content of the library is enriched continuously through the recognition process; therefore, through the constantly updated unconventional vocabulary library, the full-automatic translation of all texts to be translated including the unconventional vocabularies is finally realized.
Drawings
Fig. 1 is a flow chart of the identification method of the present invention.
Fig. 2 is a block diagram of the identification system of the present invention.
Detailed Description
Referring to fig. 1, the proposed identification method of the present invention comprises the following steps:
s1, segmenting the to-be-translated file, and identifying nouns in the to-be-translated file;
s2, judging whether the current noun contains Latin letters; if not, the noun does not need to be stored, and the next noun judgment is carried out; otherwise, go to step S3;
s3: judging whether the noun accords with the Chinese pinyin scheme, if so, the noun does not need to be stored, entering the next noun judgment, otherwise, entering the step S4:
s4: storing all the recognized nouns in an ordered list according to the position sequence of the nouns in the file to be translated;
s5: reading nouns in the ordered list in sequence;
s6: the noun is segmented by taking bytes as a unit to obtain a plurality of characteristic fields;
s7: determining whether at least one of the plurality of characteristic fields satisfies a predetermined condition; if yes, outputting the noun as a special vocabulary; otherwise, reading the next noun and continuing to judge until all nouns in the sequence list are identified.
The steps performed in fig. 1 are only one specific implementation of the method of the present invention. In practical implementation, the sequence of the step S2, the step S3 may be reversed; it may be performed after moving S3 to step S4 in the present sequence, or step S2 to step S4; likewise, S2 or S3 may be executed after the determination result of step S7 is no. It will be appreciated by those skilled in the art that the various combination steps described above may be performed separately or in combination, so long as the particular vocabulary is ultimately recognized in accordance with the predetermined criteria.
For example, the method of the present invention may not initially perform the determination of step S3, but after performing to step "if none of the plurality of feature fields currently satisfies the predetermined condition", continue the following identifying steps:
judging whether the characteristic fields contain spaces or not;
if the Chinese character does not contain a blank space, judging whether the character formed by the plurality of characteristic fields meets a Chinese pinyin scheme; if not, determining that the noun belongs to a specific vocabulary;
if the space is contained, judging whether at least one of two characters formed by the characteristic fields before and after the space does not meet the Chinese pinyin scheme, and if so, determining that the noun belongs to a specific vocabulary.
Fig. 2 shows an identification system of the present invention, which includes the following modules:
the recognition module is used for segmenting the to-be-translated file, recognizing and outputting nouns in the to-be-translated file;
the preprocessing module is used for preprocessing the nouns output by the dividing module; the pretreatment comprises the following steps: judging whether the noun contains Latin letters or not; and judging whether the noun accords with a Chinese pinyin scheme;
the storage module stores the nouns processed by the preprocessing module in an ordered list according to the position sequence of the nouns in the file to be translated;
the semantic analysis module is used for reading the nouns in the ordered list in sequence and carrying out semantic analysis on the nouns so as to determine whether the nouns belong to a specific vocabulary or not;
it is characterized in that the semantic analysis module comprises a byte segmentation module, a judgment module and a result output module,
the byte segmentation module segments the noun by taking bytes as a unit to obtain a plurality of characteristic fields;
the judging module is used for judging whether at least one of the characteristic fields meets a preset condition;
and the result output module outputs the recognition result of the vocabulary according to the judgment module.
Generally speaking, by adopting the method and the system of the invention, most of specific irregular words appearing in the translation process can be accurately recognized, and the method can realize automatic recognition output by adopting a computer software and/or hardware system. The invention can avoid the translation error similar to the background technology of the invention and improve the accuracy of the translation work; in addition, an unconventional vocabulary library can be built step by step in the translation process, and the content of the library is enriched continuously through the recognition process; therefore, through the constantly updated unconventional vocabulary library, the full-automatic translation of all texts to be translated including the unconventional vocabularies is finally realized.

Claims (5)

1. A method for recognizing a specific word in a file to be translated, wherein the specific word comprises at least one Latin letter, and the recognition method is used for segmenting the file to be translated and recognizing a noun in the file to be translated;
the method is characterized in that: the method comprises the following steps:
storing all the recognized nouns in an ordered list according to the position sequence of the nouns in the file to be translated;
reading nouns in the ordered list in sequence, and performing semantic analysis on the nouns to determine whether the nouns belong to a specific vocabulary;
performing semantic analysis on the noun to determine whether the noun belongs to a specific vocabulary, specifically including:
the noun is segmented by taking bytes as a unit to obtain a plurality of characteristic fields;
determining that the noun belongs to a specific vocabulary if at least one of the plurality of feature fields satisfies a predetermined condition;
the plurality of characteristic fields obtained by segmenting the noun by taking bytes as a unit are composed of one or more of the following fields: latin letters, spaces, additional symbols and connectors;
the satisfaction of the predetermined condition means that at least one of the following conditions is satisfied:
the plurality of characteristic fields comprise a plurality of Latin letters and connectors;
the plurality of characteristic fields comprise a plurality of Latin letters and at least one additional symbol, and the additional symbol is positioned at the upper part or the upper right corner of at least one Latin letter;
if none of the plurality of characteristic fields meets the predetermined condition, continuing the following identification steps:
(41) judging whether the characteristic fields contain spaces or not;
(42) if the Chinese character does not contain a blank space, judging whether the character formed by the plurality of characteristic fields meets a Chinese pinyin scheme; if not, determining that the noun belongs to a specific vocabulary;
(43) if the space is contained, judging whether at least one of two characters formed by the characteristic fields before and after the space does not meet the Chinese pinyin scheme, and if so, determining that the noun belongs to a specific vocabulary.
2. The method of claim 1, wherein:
storing all the recognized nouns in an ordered list according to the position sequence of the nouns in the file to be translated, and further comprising the preprocessing steps of: and judging whether the noun contains Latin letters or not, and if not, storing the noun without storage.
3. The method of claim 2, wherein:
judging whether the noun contains Latin letters; if yes, whether the noun accords with the Chinese pinyin scheme or not is continuously judged, and if the noun accords with the Chinese pinyin scheme, the noun does not need to be stored.
4. A specific vocabulary recognition system is used for recognizing a specific vocabulary in a file to be translated, wherein the specific vocabulary comprises at least one Latin letter;
the system comprises the following modules:
the recognition module is used for segmenting the to-be-translated file, recognizing and outputting nouns in the to-be-translated file;
the preprocessing module is used for preprocessing the nouns output by the dividing module; the pretreatment comprises the following steps: judging whether the noun contains Latin letters or not; and judging whether the noun accords with a Chinese pinyin scheme;
the storage module stores the nouns processed by the preprocessing module in an ordered list according to the position sequence of the nouns in the file to be translated;
the semantic analysis module is used for reading the nouns in the ordered list in sequence and carrying out semantic analysis on the nouns so as to determine whether the nouns belong to a specific vocabulary or not;
the semantic analysis module comprises a byte segmentation module, a judgment module and a result output module, wherein the byte segmentation module segments the noun by taking bytes as units to obtain a plurality of characteristic fields;
the judging module is used for judging whether at least one of the characteristic fields meets a preset condition;
the result output module outputs the recognition result of the vocabulary according to the judgment module;
the byte segmentation module is used for segmenting the noun by taking bytes as a unit to obtain a plurality of characteristic fields, and the characteristic fields comprise one or more of the following fields: latin letters, spaces, additional symbols and connectors; the satisfaction of the predetermined condition means that at least one of the following conditions is satisfied:
the plurality of characteristic fields comprise a plurality of Latin letters and connectors;
the plurality of characteristic fields comprise a plurality of Latin letters and at least one additional symbol, and the additional symbol is positioned at the upper part or the upper right corner of at least one Latin letter;
if none of the plurality of characteristic fields meets the predetermined condition, continuing the following identification steps:
(41) judging whether the characteristic fields contain spaces or not;
(42) if the Chinese character does not contain a blank space, judging whether the character formed by the plurality of characteristic fields meets a Chinese pinyin scheme; if not, determining that the noun belongs to a specific vocabulary;
(43) if the space is contained, judging whether at least one of two characters formed by the characteristic fields before and after the space does not meet the Chinese pinyin scheme, and if so, determining that the noun belongs to a specific vocabulary.
5. A computer-readable storage medium having computer-readable stored instructions stored thereon, the instructions being executable by a memory and a processor for implementing the method of any one of claims 1-3.
CN201711253593.2A 2017-12-04 2017-12-04 Method for identifying specific vocabulary Active CN107870905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711253593.2A CN107870905B (en) 2017-12-04 2017-12-04 Method for identifying specific vocabulary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711253593.2A CN107870905B (en) 2017-12-04 2017-12-04 Method for identifying specific vocabulary

Publications (2)

Publication Number Publication Date
CN107870905A CN107870905A (en) 2018-04-03
CN107870905B true CN107870905B (en) 2021-09-17

Family

ID=61755073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711253593.2A Active CN107870905B (en) 2017-12-04 2017-12-04 Method for identifying specific vocabulary

Country Status (1)

Country Link
CN (1) CN107870905B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241543B (en) * 2018-09-19 2023-05-30 传神语联网网络科技股份有限公司 Preprocessing technique for consistent translation terms

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247708A (en) * 2017-07-03 2017-10-13 中国银行股份有限公司 A kind of Sex criminals method and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7136805B2 (en) * 2002-06-11 2006-11-14 Fuji Xerox Co., Ltd. System for distinguishing names of organizations in Asian writing systems
US8959011B2 (en) * 2007-03-22 2015-02-17 Abbyy Infopoisk Llc Indicating and correcting errors in machine translation systems
CN102708147B (en) * 2012-03-26 2015-02-18 北京新发智信科技有限责任公司 Recognition method for new words of scientific and technical terminology
CN104572632B (en) * 2014-12-25 2017-07-04 武汉传神信息技术有限公司 A kind of method in the translation direction for determining the vocabulary with proper name translation
CN104572625A (en) * 2015-01-21 2015-04-29 北京云知声信息技术有限公司 Recognition method of named entity
CN106168946A (en) * 2016-06-24 2016-11-30 中国科学院信息工程研究所 A kind of method identifying user initials phenomenon

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247708A (en) * 2017-07-03 2017-10-13 中国银行股份有限公司 A kind of Sex criminals method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
威妥玛式拼音法与汉语专有名词的翻译;丁科家;《英语知识》;20121231;第33-34页 *

Also Published As

Publication number Publication date
CN107870905A (en) 2018-04-03

Similar Documents

Publication Publication Date Title
US10061768B2 (en) Method and apparatus for improving a bilingual corpus, machine translation method and apparatus
JP5356197B2 (en) Word semantic relation extraction device
US20180089169A1 (en) Method, non-transitory computer-readable recording medium storing a program, apparatus, and system for creating similar sentence from original sentences to be translated
Zhang et al. HANSpeller++: A unified framework for Chinese spelling correction
US20210019476A1 (en) Methods and apparatus to improve disambiguation and interpretation in automated text analysis using transducers applied on a structured language space
CN111046660B (en) Method and device for identifying text professional terms
CN111950301A (en) English translation quality analysis method and system for Chinese translation and English translation
CN110413972B (en) Intelligent table name field name complementing method based on NLP technology
Bhatti et al. Word segmentation model for Sindhi text
KR20100082980A (en) Method for tagging part of speech and homograph, terminal device using the same
CN107870905B (en) Method for identifying specific vocabulary
Mon et al. SymSpell4Burmese: symmetric delete Spelling correction algorithm (SymSpell) for burmese spelling checking
CN107491441B (en) Method for dynamically extracting translation template based on forced decoding
Wu et al. Integrating dictionary and web N-grams for chinese spell checking
Wang et al. Conditional Random Field-based Parser and Language Model for Tradi-tional Chinese Spelling Checker
CN109325237B (en) Complete sentence recognition method and system for machine translation
CN109344389B (en) Method and system for constructing Chinese blind comparison bilingual corpus
CN107590132B (en) Method for automatically correcting part of characters-judging by English part of speech
Lehal et al. Sangam: A Perso-Arabic to Indic script machine transliteration model
Chiu et al. Chinese spell checking based on noisy channel model
JPS59165179A (en) Dictionary look-up system
US10042843B2 (en) Method and system for searching words in documents written in a source language as transcript of words in an origin language
Mekki et al. COTA 2.0: An automatic corrector of tunisian Arabic social media texts
Kaur et al. Toward normalizing romanized gurumukhi text from social media
CN112966510A (en) Weapon equipment entity extraction method, system and storage medium based on ALBERT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant