CN109062909A - A kind of pluggable component - Google Patents

A kind of pluggable component Download PDF

Info

Publication number
CN109062909A
CN109062909A CN201810809257.XA CN201810809257A CN109062909A CN 109062909 A CN109062909 A CN 109062909A CN 201810809257 A CN201810809257 A CN 201810809257A CN 109062909 A CN109062909 A CN 109062909A
Authority
CN
China
Prior art keywords
component
corpus
pluggable
language processing
terminology bank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810809257.XA
Other languages
Chinese (zh)
Inventor
李靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Expressive Language Networking Polytron Technologies Inc
Original Assignee
Expressive Language Networking Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Expressive Language Networking Polytron Technologies Inc filed Critical Expressive Language Networking Polytron Technologies Inc
Priority to CN201810809257.XA priority Critical patent/CN109062909A/en
Publication of CN109062909A publication Critical patent/CN109062909A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation

Abstract

The present invention relates to a kind of pluggable components, including input port, output port, terminology bank load port and Language Processing component connectivity port, the pluggable module body includes an AC automation module, it can be used cooperatively from different terminology bank and language processing module, but be independent from each other component between the component and terminology bank and language processing module.It is designed by decoupling type, the component itself does not change with the function of terminology bank and language processing module and changed, on the contrary, can directly be used in conjunction with each other from different terminology bank and language processing module.

Description

A kind of pluggable component
Technical field
The invention belongs to information technology field more particularly to a kind of pluggable components.
Background technique
It exchanges and is handled in identification in multiple languages, usually to identify specific term therein and certain specific Do not have the general term of conventional sense under context, to be screened in Language Processing to obtain correctly identification and processing As a result.
In the prior art, the method for establishing a terminology bank is generallyd use, by whether depositing in judgement treated corpus In the term for being included in terminology bank, to be searched and be replaced identification.However, the meaning due to specific term is varied, and Performance results of the general term under specific different context also generally occur within variation, and this lookup identification process can not be using calculating Machine realizes automatically because machine not after these term process of expectability the result is that, leads to above-mentioned lookup, identification process only It can manually carry out, so as to cause inefficiency, practical function does not occur for specific term library preconfigured in this way.
Summary of the invention
To solve the above problems, the present invention devises a kind of pluggable component, it can be from different terminology bank and language Processing module is used cooperatively, but is independent from each other component between the component and terminology bank and language processing module.Pass through Decoupling type design, the component itself does not change with the function of terminology bank and language processing module and is changed, on the contrary, can be direct It is used in conjunction with each other from different terminology bank and language processing module.
Technical scheme is as follows:
A kind of pluggable component, including input port, output port, terminology bank load port and Language Processing component connecting pin Mouthful, which includes an AC automation module.
The input port, for multilingual corpus to be processed to be input to the pluggable component;
The pluggable component is by the AC automatic machine, through terminology bank load port load and the multilingual corpus pair The terminology bank answered;And the first term process is carried out to the multilingual corpus;
The Language Processing component receives first term process as a result, and carrying out languages identification and conversion to it;
The AC automatic machine receives the languages that the Language Processing component executes and identifies with conversion as a result, and carrying out second to it Term process;
The output port exports the second term process result.
Optionally, the multilingual corpus to be processed is corpus to be translated, and the Language Processing component includes multiple Translation component, the languages identification and conversion include carrying out translation processing to corpus to be translated;
Optionally, first term process includes: the AC automatic machine according to the load and the multilingual corpus pair The terminology bank answered searches the specific term for meeting qualifications in the corpus to be translated of input, by institute in corpus to be translated It states and meets the specific terms of qualifications be substituted for cannot be by the special marking of the Language Processing component recognition;
Optionally, the second processing includes: to be identified with converting according to the languages as a result, AC automatic machine is based on the load Terminology bank corresponding with the multilingual corpus, the languages are identified and are replaced with the special marking in the result converted For target terms, thus output treated translation result.
The present invention abandoned pervious term chain refer to relationship confirmation method in, need from translation component using model into It is that the chain of term in parallel corpora being referred to, relationship is indicated by model parameter to do with what translation component bundled by force when row translation Method, component of the present invention and translation component are decouplings, independent mutually, be one can plug component, can with arbitrarily turn over Engine docking is translated, the identification that term chain in parallel corpora refers to relationship is completed.
Detailed description of the invention
Fig. 1 is the terminology bank recognition methods of the prior art
Fig. 2 is pluggable component framework figure of the invention
Fig. 3 is the specific system diagram that pluggable component of the invention is used for translation process
Specific embodiment
Referring to Fig. 1, it is assumed that need to carry out Language Processing to some corpus " give me a magazine ", for example, translation. The corpus is the article of a description gunbattle, and correct translation result should be " giving me a magazine ".
If not establishing terminology bank, in common Language Processing output result, has plenty of " giving me a magazine ", have plenty of " giving me a periodical " has plenty of " giving me a text " ..., and as a result all inaccurate, next interpreter can only search manually, And (because computer can not determine which word of the lookup) cannot be searched automatically, interpreter finds out in translation result manually one by one " magazine, periodical, text, ordnance " etc., then it is replaced with into " magazine " manually.The process efficiency and its low.As it can be seen that even if pre- It is first configured with specific term library, which can not also realize automatically, as shown in Figure 1.
It is pluggable component framework figure of the invention, including input port (1), output port (2), terminology bank referring to Fig. 2 Load port (3) and Language Processing component connectivity port (4), the pluggable module body include an AC automation module.
The input port (1), for multilingual corpus to be processed to be input to the pluggable component;
The pluggable component is by the AC automatic machine, through terminology bank load port load and the multilingual corpus pair The terminology bank answered;And the first term process is carried out to the multilingual corpus;
The Language Processing component receives first term process as a result, and carrying out languages identification and conversion to it;
The AC automatic machine receives the languages that the Language Processing component executes and identifies with conversion as a result, and carrying out second to it Term process;
The output port exports the second term process result.
Referring to Fig. 3, for the corpus input to be translated described in Fig. 1, process is as follows:
The multilingual corpus (corpus to be translated: languages input in source as shown in the figure) to be processed of input port input is " give me a Magazine ", the terminology bank pre-established include " magazine-magazine ", which can give according to semantic context To limit;
The term of terminology bank is loaded into AC automatic machine.
It is matched next, being treated using AC automatic machine and translating corpus, the source term matched is substituted for special marking.
Specifically, " magazine " belongs to AC automatic machine according to the source term on Auto-matching, matching process can be examined Consider semantic context to be realized by the automatic plane mechanism of AC.
The source term is replaced with certain special marking in the present invention due to cannot correctly be translated, such as " % ... % ", as long as this label cannot be translated array identification;
At this point, " give me a % ... % " enters translation array as flag sequence, the translated corpus of return is " to me One % ... % ";
(4) terminology bank is reused, special marking is replaced back to the target terms for needing to replace.
At this point, " giving me a % ... % " can export correct translation result: giving me a magazine (target as shown in the figure Languages output).
According to above-mentioned steps as can be seen that since by terminology bank, in conjunction with AC automatic machine, the above process can automate reality Now and guarantee that result is accurate, greatly improves efficiency.
In addition, the present invention is designed as pluggable decoupling form, can be convenient and a variety of terminology banks, multilingual processing group Part, such as translation engine are used cooperatively.
Pluggable component of the invention carries out multi-mode string character match using AC automatic machine, ensure that in source statement With term in dictionary, especially when terminology bank is especially big, matches and occur which term in this sentence, entire time cost can Descend to original logarithm rank.

Claims (6)

1. a kind of pluggable component, including input port, output port, terminology bank load port and the connection of Language Processing component Port, it is characterised in that:
The pluggable module body includes an AC automation module;
The input port, for multilingual corpus to be processed to be input to the pluggable component;
The pluggable component is by the AC automatic machine, through terminology bank load port load and the multilingual corpus pair The terminology bank answered;And the first term process is carried out to the multilingual corpus;
The Language Processing component receives first term process as a result, and carrying out languages identification and conversion to it;
The AC automatic machine receives the languages that the Language Processing component executes and identifies with conversion as a result, and carrying out second to it Term process;
The output port exports the second term process result.
2. pluggable component as described in claim 1, wherein the multilingual corpus to be processed is corpus to be translated, The Language Processing component includes multiple translation components, and the languages identification and conversion include translating to corpus to be translated Processing.
3. pluggable component as claimed in claim 2, first term process includes: that the AC automatic machine adds according to described The terminology bank corresponding with the multilingual corpus carried, searches the special art for meeting qualifications in the corpus to be translated of input Language, the specific term that qualifications are met described in corpus to be translated is substituted for cannot be by the Language Processing component recognition Special marking.
4. pluggable component as claimed in claim 3, wherein the second processing includes: to identify and turn according to the languages It is changing as a result, AC automatic machine is based on load terminology bank corresponding with the multilingual corpus, by languages identification with The special marking in the result of conversion replaces with target terms, thus output treated translation result.
5. pluggable component as claimed in claim 2, wherein the pluggable component and Language Processing component are can to decouple 's.
6. pluggable component as claimed in claim 2, wherein the pluggable component and terminology bank can decouple.
CN201810809257.XA 2018-07-23 2018-07-23 A kind of pluggable component Pending CN109062909A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810809257.XA CN109062909A (en) 2018-07-23 2018-07-23 A kind of pluggable component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810809257.XA CN109062909A (en) 2018-07-23 2018-07-23 A kind of pluggable component

Publications (1)

Publication Number Publication Date
CN109062909A true CN109062909A (en) 2018-12-21

Family

ID=64836087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810809257.XA Pending CN109062909A (en) 2018-07-23 2018-07-23 A kind of pluggable component

Country Status (1)

Country Link
CN (1) CN109062909A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693309A (en) * 2011-05-26 2012-09-26 中国科学院计算技术研究所 Candidate phrase querying method and aided translation system for computer aided translation
CN104781791A (en) * 2011-12-05 2015-07-15 持续电信解决方案公司 Universal pluggable cloud disaster recovery system
CN106250375A (en) * 2016-08-09 2016-12-21 北京百度网讯科技有限公司 Translation processing method and device
CN108009160A (en) * 2017-11-30 2018-05-08 北京金山安全软件有限公司 Corpus translation method and device containing named entity, electronic equipment and storage medium
CN108228574A (en) * 2017-12-07 2018-06-29 科大讯飞股份有限公司 Text translation processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693309A (en) * 2011-05-26 2012-09-26 中国科学院计算技术研究所 Candidate phrase querying method and aided translation system for computer aided translation
CN104781791A (en) * 2011-12-05 2015-07-15 持续电信解决方案公司 Universal pluggable cloud disaster recovery system
CN106250375A (en) * 2016-08-09 2016-12-21 北京百度网讯科技有限公司 Translation processing method and device
CN108009160A (en) * 2017-11-30 2018-05-08 北京金山安全软件有限公司 Corpus translation method and device containing named entity, electronic equipment and storage medium
CN108228574A (en) * 2017-12-07 2018-06-29 科大讯飞股份有限公司 Text translation processing method and device

Similar Documents

Publication Publication Date Title
CN111753099B (en) Method and system for enhancing relevance of archive entity based on knowledge graph
CN109299480B (en) Context-based term translation method and device
CN109726298B (en) Knowledge graph construction method, system, terminal and medium suitable for scientific and technical literature
CN101667176A (en) Method and system for counting machine translation based on phrases
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN108959276A (en) A kind of term discovery method and its system for translation
US20220138240A1 (en) Source code retrieval
CN112464662A (en) Medical phrase matching method, device, equipment and storage medium
CN112528681A (en) Cross-language retrieval and model training method, device, equipment and storage medium
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN110263127A (en) Text search method and device is carried out based on user query word
CN115630843A (en) Contract clause automatic checking method and system
CN111091009B (en) Document association auditing method based on semantic analysis
CN114398968B (en) Method and device for labeling similar customer-obtaining files based on file similarity
Shanmugalingam et al. Language identification at word level in Sinhala-English code-mixed social media text
CN108984540A (en) A kind of method and auxiliary translation system of supplementary translation
CN109062909A (en) A kind of pluggable component
CN112380848A (en) Text generation method, device, equipment and storage medium
CN113139558A (en) Method and apparatus for determining a multi-level classification label for an article
CN114090620B (en) Query request processing method and device
CN114461665B (en) Method, apparatus and computer program product for generating a statement transformation model
WO2012091539A1 (en) A semantic similarity matching system and a method thereof
CN114139543A (en) Entity link corpus labeling method and device
WO2016059505A1 (en) A system and a method for recognition of aerospace parts in unstructured text
CN109165297B (en) Universal entity linking device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221