CN108733658A - Institution term Chinese-English translation method - Google Patents

Institution term Chinese-English translation method Download PDF

Info

Publication number
CN108733658A
CN108733658A CN201710779839.3A CN201710779839A CN108733658A CN 108733658 A CN108733658 A CN 108733658A CN 201710779839 A CN201710779839 A CN 201710779839A CN 108733658 A CN108733658 A CN 108733658A
Authority
CN
China
Prior art keywords
translation
entity
institution term
institution
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710779839.3A
Other languages
Chinese (zh)
Inventor
李斌
杨建华
汤诗华
钱丰收
马宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Radio And Television University
Original Assignee
Anhui Radio And Television University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Radio And Television University filed Critical Anhui Radio And Television University
Priority to CN201710779839.3A priority Critical patent/CN108733658A/en
Publication of CN108733658A publication Critical patent/CN108733658A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

The invention discloses a kind of institution term Chinese-English translation method, the specific steps are:Obtain the corresponding expanding query set of institution term entity;Using the new term retrieval network resource comprising expanded set, obtains and mix bilingual digest resources;Institution term entity translation candidate is extracted from the bilingual digest resources of mixing and is ranked up according to confidence level;Obtain translation result;Expanding query method combines two methods of the inquiry of entity translation result construction and co-occurrence descriptor translation expanding query, and translation is obtained to optimal alignment result using greedy algorithm when building translation model, improve the accuracy and efficiency that subsequent language block extracts and language block translation probability calculates, the present invention considers the internal structure feature of institution term, it uses and establishes translation model by translation unit of language block, emphasis solves the extraction of candidate language block and probability calculation and is translated and decoded algorithm based on context-free, reduce translation time complexity, improve accuracy and the efficiency of translation.

Description

Institution term Chinese-English translation method
Technical field
The present invention relates to field of language translation, and in particular to a kind of institution term Chinese-English translation method.
Background technology
Entity is named relative to name, place name etc., the structure of institution term is increasingly complex, because can in institution term It can both include name, place name even another mechanism name.It is usually using in conjunction with transliteration and free translation to the translation of institution term It is translated, simultaneously because it is complicated, it needs to carry out a degree of word sequencing, so not to institution term translation Only to solve the problems, such as common machines translate it is intrinsic, as word selection, word sequencing, it is also necessary to solve the problems, such as transliteration and The problem of transliteration and free translation are combined, therefore the translation of institutional framework name is still a difficulty in natural language processing problem Point still has prodigious challenge.
Currently, the research of the institution term based on local translation model relatively gos deep into and ripe, the transliteration based on statistics Model method solves the problems, such as to meet the transliteration of transliteration rule to a certain extent, for partly meeting transliteration rule or not The case where meeting transliteration rule is helpless.Phrase-based context-sensitive institution term model is with conventional machines mould It is improved based on type, the internal structure feature of institution term is not considered, and time complexity is high, for organization The translation model of name whole (transliteration and free translation) is not mature enough, and research is fewer, it is necessary to further further investigation.
Invention content
In order to solve the above technical problems, the present invention proposes a kind of institution term Chinese-English translation method, to reach more accurate The purpose of true translation institution term.
In order to achieve the above objectives, technical scheme is as follows:
A kind of institution term Chinese-English translation method, method and step are as follows:
Step 1:Obtain the corresponding expanding query set of institution term entity;
Step 2:Using the new term retrieval network resource comprising expanded set, obtains and mix bilingual digest resources;
Step 3:Institution term entity translation candidate is extracted from the bilingual digest resources of mixing and is arranged according to confidence level Sequence;
Step 4:Obtain translation result.
Preferably, the expanding query set described in step 1 includes:Institution term entity translation result construction is looked into Inquiry and co-occurrence descriptor translation expanding query,
The institution term entity translation result construction inquiry is as follows:Build institution term translation It is right;To institution term translation to carrying out internal alignment;The extraction of statement block is carried out according to the translation confidence level of calculating; Generate the institution term translation model based on the statement block;Effective information result is extracted,
The co-occurrence descriptor translation expanding query method and step is:By source query word submission search engine, acquisition includes Then the original language summary info of source inquiry is extracted from the original language summary info obtained using TF-IDF and co-occurrence is inquired in source Theme vocabulary, obtain theme vocabulary after, the translation that these theme vocabulary are searched from bilingual dictionary is last as this method Expanded set.
Preferably, the step of internal alignment, is:Utilize the GIZA++ word alignments generally used in machine translation Tool handles having carried out word contraposition the Chinese-English translation of mechanism name, including Han-Ying Heying-Chinese both direction, according to two sides To alignment result intersection obtain alignment anchor point;Extract candidate character string;It is aligned anchor point respectively to the left and right according to each is obtained Directional Extension is current to be aligned anchor point plus the words extended as candidate word string until next alignment anchor point;It calculates bilingual The translation confidence level of single language string;For each Named entity translation pair, optimal alignment result is obtained using greedy algorithm.
Preferably, the computational methods of the translation confidence level are using similar to translation of the TF-IDF methods to acquisition Segment is given a mark, and given Chinese string o and English string e translations confidence level are calculated as follows:
Preferably, the statement block is extracted is translated and decoded algorithm using context-free, organization's status The Keywords section, region or range qualifier part and other qualifier parts are indicated for three parts, it first will be after alignment Institution term entity retains its derivation position in entirely name entity to being split as three parts, and to preceding two class part Confidence ceases, and forms a series of derivation rule and corresponding confidence level in this way, for the translation process packet of given name entity It includes:Language block is split, i.e., given institution term is split as three parts;Entity derives translation, and the sequence of translation is region Or range qualifier part, keyword fragment, other qualifier parts, if certain class part is not present in training corpus, Transliteration interpretation method is combined to translate using conventional machines translation.
It is as follows preferably, the greedy algorithm obtains optimal alignment result:For a certain specific life Name entity pair, extract the entity to comprising all { c, e };According to the descending sort of the score of { c, e }, and it is stored in collection It closes in scoreArray;First element { cc, ee } is deleted from scoreArray, the name entity to according to { cc, ee } Contraposition update;It deletes { cc, * } and { *, ee } all in scoreArray;Repeat score descending sort until ScoreArray is sky;Best name entity is obtained to contraposition.
Preferably, being extracted described in step 3, institution term entity combines frequency measure of variation and adjacency information comes Candidate translatable strings are extracted, translation similarity, co-occurrence information, the length between candidate translatable strings and entity to be translated are calculated separately Information and transliteration information, consider multiple feature scores, sort according to comprehensive score, export translation sequences.
The invention has the advantages that:
(1) the present invention using translation model and the network translation to Chinese-English institution term by extracting the technology being combined It is furtherd investigate, realizes the institution term translation system that a high-performance is combined based on translation with network, the system energy It excavates all possible translation of cadmium ingot institution term and calculates the confidence level of translation, and extract the webpage for including the translation Resource is read for user, finally corrects translation result by user, and builds Chinese-English institution term translation word on this basis Allusion quotation.
(2) the present invention obtains optimal alignment as a result, improving subsequent language block extraction and language block translation using greedy algorithm The accuracy of probability calculation and efficiency.
(3) the present invention considers the internal structure feature of institution term, uses to establish as translation unit using language block and translate Model, emphasis solve the extraction of candidate language block and probability calculation and are translated and decoded algorithm based on context-free, reduce and turn over Time complexity is translated, accuracy and the efficiency of translation are improved.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described.
Fig. 1 is interpretation method flow chart disclosed by the embodiments of the present invention;
Fig. 2 is structure translation model flow chart disclosed by the embodiments of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes.
The present invention provides a kind of institution term Chinese-English translation method, operation principle be by using translation model and The network translation is extracted the technology being combined and is furtherd investigate, and realizes the group that a high-performance is combined based on translation with network Loom structure name translation system achievees the purpose that precise and high efficiency translates institution term.
With reference to embodiment and specific implementation mode, the present invention is described in further detail.
As depicted in figs. 1 and 2, steps are as follows for specific implementation of the invention:
Step 1:Obtain the corresponding expanding query set of institution term entity;
Step 2:Using the new term retrieval network resource comprising expanded set, obtains and mix bilingual digest resources;
Step 3:Institution term entity translation candidate is extracted from the bilingual digest resources of mixing and is arranged according to confidence level Sequence;
Step 4:Obtain translation result.
Expanding query set described in step 1 includes:The inquiry of institution term entity translation result construction and co-occurrence master Translation expanding query is write inscription,
The institution term entity translation result construction inquiry is as follows:Build institution term translation It is right;To institution term translation to carrying out internal alignment;The extraction of statement block is carried out according to the translation confidence level of calculating; Generate the institution term translation model based on the statement block;Effective information result is extracted,
The co-occurrence descriptor translation expanding query method and step is:By source query word submission search engine, acquisition includes Then the original language summary info of source inquiry is extracted from the original language summary info obtained using TF-IDF and co-occurrence is inquired in source Theme vocabulary, obtain theme vocabulary after, the translation that these theme vocabulary are searched from bilingual dictionary is last as this method Expanded set.
By the current research to existing institution term structure and translation feature, result of study is Chinese organization Word inside name is all notional word, they are at least translated as one or more English glossaries, English institution term in addition to " of ", " with ", " the ", " and ", " for " remaining is also all notional word, and the vocabulary alignment inside institution term It is that blocky alignment structures are presented, is aligned by establishing an institution term internal vocabulary based on alignment anchor point or so extension Method, wherein important solution is the probability calculation of the internal word string being aligned and the selection of global optimum's alignment thereof.
First, using the GIZA++ word alignments tool generally used in machine translation to the Chinese-English translation of mechanism name to carrying out Word contraposition processing, including Han-Ying Heying-Chinese both direction, GIZA++ tools only allow each Chinese in Ying-Chinese alignment The at most corresponding English words of word (assuming that after participle) only allow each English words to correspond to one equally when negative direction is aligned Chinese word.It is the Chinese word being aligned each other in two directions and English words to be aligned anchor point.Secondly it is ground using proposed by the present invention Study carefully method and optimizes vocabulary alignment result on the basis of the first step.The method includes the steps of:
Step 1:Using the GIZA++ word alignments tool generally used in machine translation to the Chinese-English translation of mechanism name into Word contraposition of having gone is handled, including Han-Ying Heying-Chinese both direction.It is obtained and is aligned according to the intersection of the alignment result of both direction Anchor point;
Step 2:Extract candidate character string;It is extended in the lateral direction until next respectively according to each alignment anchor point is obtained A alignment anchor point, it is current to be aligned anchor point plus the words extended as candidate word string;
Step 3:Calculate the translation confidence level of bilingual single language string;
Step 4:For each Named entity translation pair, optimal alignment result is obtained using greedy algorithm;
Main algorithm in above-mentioned steps is as follows:
The computational methods of translation confidence level are using the translation segment marking for being similar to TF-IDF methods to acquisition, for giving Fixed Chinese string o and English string e translations confidence level is calculated as follows:
Wherein:Represent the co-occurrence number of e and o;Generation Table translates the number of the classification of e with o each other;To the length punishment parameter of Chinese;Chinese segment o is an English piece The translation of section e;N represents the classification number of all English entity segments.
The acquisition algorithm of optimal alignment is this hair on the basis of calculating the probability of each pair of candidate Chinese string c and English string e It is bright that optimal alignment is obtained as a result, being as follows using Greedy strategy:
Step 1:For a certain specific name entity pair, extract the entity to comprising all { c, e };
Step 2:According to the descending sort of the score of { c, e }, and it is stored in set scoreArray;
Step 3:First element { cc, ee } is deleted from scoreArray, the name entity to according to { cc, ee } Contraposition update;
Step 4:It deletes { cc, * } and { *, ee } all in scoreArray;
Step 5:Step 2 is repeated until scoreArray is sky;
Step 6:Best name entity is obtained to contraposition;
Institution term interpretation method based on statement block is mainly used establishes translation model by translation unit of language block, weight It puts the extraction for solving candidate language block and probability calculation and algorithm is translated and decoded based on context-free.
The present invention translates institution term using synchronous context Grammars, specifically, organization's status The Keywords section, region or range qualifier part and other qualifier parts are indicated for three parts.It first will be after alignment Institution term entity retains its derivation position in entirely name entity to being split as three parts, and to preceding two class part Confidence ceases, and forms a series of derivation rule and corresponding confidence level in this way.For the translation process packet of given name entity It includes:Language block is split, i.e., given institution term is split as three parts;Entity derives translation, and the sequence of translation is region Or range qualifier part, keyword fragment, other qualifier parts.If certain class part is not present in training corpus, Transliteration interpretation method is combined to translate using conventional machines translation.
Such as:<The national safety in production committee, National Committee of Industry Safety>In training After process, it is extracted as three rules:Rule one:<National #, National#>, rule two:<The # committees, Committee of#>, rule three:<Safety in production, Industry Safety>And the translation probability believed.
The translation process of " the national safety in production committee " is:The name entity cutting is by language block cutting:Region or model Qualifier [whole nation] is enclosed, other modified parts [safety in production], keyword [committee];Translation process is:Use rule one:< The national safety in production committee, #>-><The national safety in production committee, National#>;Use rule two:<National safety is raw The production committee, National#>-><The national safety in production committee, National Committee of#>;Use rule three: <The national safety in production committee, National Committee of#>-><The national safety in production committee, National Committee of Industry Safety>。
Enquiry expanding method construction is the internal characteristics by the way that the effective information in extraction translation result to be used as to vocabulary, together When combine co-occurrence word be used as external feature, construct query expansion, due to construct query expansion two methods both consider tissue The internal characteristics of structure name entity it is further contemplated that the co-occurrence information of webpage occurs in institutional framework name entity, thus can obtain effective Bilingual digest resources, simultaneously because bilingual abstract word is fewer, and institution term Entity recognition often introduces mistake Accidentally, the present invention extracts translation directly from bilingual abstract, considers the translation information, length information, transliteration letter of candidate string Breath, the conduct candidate translation of output integrated highest scoring.Present invention employs extracted from translation result based on probability-weighted algorithm Effective information constructs query expansion in combination with co-occurrence descriptor translation.
The selection of query expansion seriously affects the quantity and quality for obtaining bilingual resource, inquires and returns after extending by analysis Abstract as a result, find its quality with merely using source inquiry return result compared with, quality, which has, to be obviously improved, substantially The upper correct translation for including name entity.
Query construction based on institution term translation result is mainly by counting probability-weighted in Top-N translation results Maximum N number of minimum translation unit (word or word), be used as the query expansion set of this method construction.It is general to weight frequency Rate is calculated according to following formula:
As a result, p (Ti| α) be i-th of translation result confidence level, c represents some Chinese character or word in result.
In conjunction with the inquiry of institution term entity translation result construction with co-occurrence descriptor translation expanding query as a result, into one Step extracts translation result, and the method that translation result extracts is to obtain to contain using effective enquiry expanding method The bilingual web page of institution term entity translation, since the identification process of institution term entity often introduces mistake, therefore not Institution term Entity recognition can be carried out to the bilingual web page of acquisition.Structure extraction is translated for institution term, is combined first Frequency measure of variation and adjacency information extract candidate translatable strings.Next calculates separately candidate translatable strings and entity to be translated Between translation similarity, co-occurrence information, length information and transliteration information, multiple feature scores are considered, according to comprehensive Divide sequence, exports translation sequences.
Candidate translatable strings extract the frequency measure of variation used and adjacency information to extract candidate translatable strings.Formula is as follows:
Wherein, the phrase that s is made of several words, the frequency of freq (s) phrases s, xi are any of phrase s single The frequency of word,The average frequency of all words in phrase s, left_n be with the various words of the left adjoinings of s sum, Right_n is and the various words of the right adjoinings of s are total.In the candidate translation set of strings being drawn into, pass through the computer candidate String left and right.

Claims (7)

1. a kind of institution term Chinese-English translation method, which is characterized in that method and step is as follows:
Step 1:Obtain the corresponding expanding query set of institution term entity;
Step 2:Using the new term retrieval network resource comprising expanded set, obtains and mix bilingual digest resources;
Step 3:Institution term entity translation candidate is extracted from the bilingual digest resources of mixing and is ranked up according to confidence level;
Step 4:Obtain translation result.
2. institution term Chinese-English translation method according to claim 1, which is characterized in that the extension described in step 1 is looked into Asking set includes:The inquiry of institution term entity translation result construction and co-occurrence descriptor translation expanding query,
The institution term entity translation result construction inquiry is as follows:Build institution term translation pair;It is right The institution term translation is to carrying out internal alignment;The extraction of statement block is carried out according to the translation confidence level of calculating;Generate base In the institution term translation model of the statement block;Effective information result is extracted,
The co-occurrence descriptor translation expanding query method and step is:Search engine, acquisition is submitted to be looked into comprising source source query word Then the original language summary info of inquiry extracts the theme that co-occurrence is inquired with source using TF-IDF from the original language summary info obtained Vocabulary searches the translation of these theme vocabulary superset last as this method after obtaining theme vocabulary from bilingual dictionary It closes.
3. institution term Chinese-English translation method according to claim 2, which is characterized in that described internal the step of being aligned For:Using the GIZA++ word alignments tool generally used in machine translation to the Chinese-English translation of mechanism name to having carried out at word contraposition Reason, including Han-Ying Heying-Chinese both direction obtain alignment anchor point according to the intersection of the alignment result of both direction;It extracts candidate Word string;According to each alignment anchor point is obtained, extension is currently aligned anchor point up to next alignment anchor point in the lateral direction respectively In addition the words of extension is as candidate word string;Calculate the translation confidence level of bilingual single language string;It is turned over for each name entity It translates pair, optimal alignment result is obtained using greedy algorithm.
4. institution term Chinese-English translation method according to claim 2, which is characterized in that the meter of the translation confidence level Calculation method is that given Chinese string o and English string e are turned over using the translation segment marking for being similar to TF-IDF methods to acquisition Confidence level is translated to be calculated as follows:
5. institution term Chinese-English translation method according to claim 2, which is characterized in that the statement block, which extracts, to be used Context-free is translated and decoded algorithm, and institution term, which is divided into three parts, indicates that the Keywords section, region or range are repaiied Excuse part and other qualifier parts, first by the institution term entity after alignment to being split as three parts, and to preceding Two class parts retain its derivation location information in entirely name entity, form a series of derivation rule and corresponding in this way Confidence level, the translation process for given name entity include:Language block is split, i.e., given institution term is split as three A part;Entity derives translation, and the sequence of translation is region or range qualifier part, keyword fragment, other qualifier portions Point, if certain class part is not present in training corpus, combine transliteration interpretation method to translate using conventional machines translation.
6. institution term Chinese-English translation method according to claim 3, which is characterized in that the greedy algorithm obtains most Excellent alignment result is as follows:For a certain specific name entity pair, extract the entity to comprising all { c, e }; According to the descending sort of the score of { c, e }, and it is stored in set scoreArray;First is deleted from scoreArray Element { cc, ee } updates the name entity to being aligned according to { cc, ee };Delete in scoreArray all { cc, * } with {*,ee};The descending sort of score is repeated until scoreArray is sky;Best name entity is obtained to contraposition.
7. institution term Chinese-English translation method according to claim 1, which is characterized in that extract tissue described in step 3 Mechanism name entity combines frequency measure of variation and adjacency information to extract candidate translatable strings, calculates separately candidate translatable strings and waits for Translation similarity, co-occurrence information, length information and the transliteration information between entity are translated, multiple feature scores is considered, presses It sorts according to comprehensive score, exports translation sequences.
CN201710779839.3A 2017-09-01 2017-09-01 Institution term Chinese-English translation method Pending CN108733658A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710779839.3A CN108733658A (en) 2017-09-01 2017-09-01 Institution term Chinese-English translation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710779839.3A CN108733658A (en) 2017-09-01 2017-09-01 Institution term Chinese-English translation method

Publications (1)

Publication Number Publication Date
CN108733658A true CN108733658A (en) 2018-11-02

Family

ID=63940332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710779839.3A Pending CN108733658A (en) 2017-09-01 2017-09-01 Institution term Chinese-English translation method

Country Status (1)

Country Link
CN (1) CN108733658A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888940A (en) * 2019-10-18 2020-03-17 平安科技(深圳)有限公司 Text information extraction method and device, computer equipment and storage medium
CN111144111A (en) * 2019-12-30 2020-05-12 北京世纪好未来教育科技有限公司 Translation method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676898A (en) * 2008-09-17 2010-03-24 中国科学院自动化研究所 Method and device for translating Chinese organization name into English with the aid of network knowledge
CN106776560A (en) * 2016-12-15 2017-05-31 昆明理工大学 A kind of Kampuchean organization name recognition method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676898A (en) * 2008-09-17 2010-03-24 中国科学院自动化研究所 Method and device for translating Chinese organization name into English with the aid of network knowledge
CN106776560A (en) * 2016-12-15 2017-05-31 昆明理工大学 A kind of Kampuchean organization name recognition method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FEI HUANG ET.AL: "Mining Key Phrase Translations from Web Corpora", 《HUMAN LANGUAGE TECHNOLOGY CONFERENCE AND CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE》 *
李业刚: "双语最大名词短语分析及应用研究", 《中国博士学位论文全文数据库 信息科技辑》 *
李力沛 等: "基于修正TF-IDF的搜索引擎查询扩展模型", 《福建电脑》 *
李斌 等: "基于网络的中文未登录词译文挖掘方法研究", 《安徽广播电视大学学报》 *
马国来 等: "基于机器翻译语块的命名实体翻译方法研究", 《硅谷》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888940A (en) * 2019-10-18 2020-03-17 平安科技(深圳)有限公司 Text information extraction method and device, computer equipment and storage medium
CN110888940B (en) * 2019-10-18 2022-10-25 平安科技(深圳)有限公司 Text information extraction method and device, computer equipment and storage medium
CN111144111A (en) * 2019-12-30 2020-05-12 北京世纪好未来教育科技有限公司 Translation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106844352B (en) Word prediction method and system based on neural machine translation system
CN106649597B (en) Method for auto constructing is indexed after a kind of books book based on book content
CN110134772A (en) Medical text Relation extraction method based on pre-training model and fine tuning technology
CN112507065B (en) Code searching method based on annotation semantic information
CN103500160B (en) A kind of syntactic analysis method based on the semantic String matching that slides
CN111310480B (en) Weakly supervised Hanyue bilingual dictionary construction method based on English pivot
CN110866399B (en) Chinese short text entity recognition and disambiguation method based on enhanced character vector
Zhou et al. Chinese named entity recognition via joint identification and categorization
CN105975625A (en) Chinglish inquiring correcting method and system oriented to English search engine
CN105138514B (en) It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method
JP5449521B2 (en) Search device and search program
CN105068997B (en) The construction method and device of parallel corpora
CN106126620A (en) Method of Chinese Text Automatic Abstraction based on machine learning
CN106383818A (en) Machine translation method and device
CN104750687A (en) Method for improving bilingual corpus, device for improving bilingual corpus, machine translation method and machine translation device
CN109308321A (en) A kind of knowledge question answering method, knowledge Q-A system and computer readable storage medium
CN103544309A (en) Splitting method for search string of Chinese vertical search
CN109597895B (en) Knowledge graph-based official document searching method
CN108038099B (en) Low-frequency keyword identification method based on word clustering
CN101593173A (en) A kind of reverse Chinese-English transliteration method and device
CN112051986B (en) Code search recommendation device and method based on open source knowledge
CN102779135A (en) Method and device for obtaining cross-linguistic search resources and corresponding search method and device
CN104375988A (en) Word and expression alignment method and device
CN105912522A (en) Automatic extraction method and extractor of English corpora based on constituent analyses
CN106156013B (en) A kind of two-part machine translation method that regular collocation type phrase is preferential

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181102