CN109062909A - A kind of pluggable component - Google Patents
A kind of pluggable component Download PDFInfo
- Publication number
- CN109062909A CN109062909A CN201810809257.XA CN201810809257A CN109062909A CN 109062909 A CN109062909 A CN 109062909A CN 201810809257 A CN201810809257 A CN 201810809257A CN 109062909 A CN109062909 A CN 109062909A
- Authority
- CN
- China
- Prior art keywords
- component
- corpus
- pluggable
- language processing
- terminology bank
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 claims abstract description 32
- 238000000034 method Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 22
- 238000013519 translation Methods 0.000 claims description 16
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000012797 qualification Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 abstract description 3
- 230000008859 change Effects 0.000 abstract description 2
- 238000012790 confirmation Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
Abstract
The present invention relates to a kind of pluggable components, including input port, output port, terminology bank load port and Language Processing component connectivity port, the pluggable module body includes an AC automation module, it can be used cooperatively from different terminology bank and language processing module, but be independent from each other component between the component and terminology bank and language processing module.It is designed by decoupling type, the component itself does not change with the function of terminology bank and language processing module and changed, on the contrary, can directly be used in conjunction with each other from different terminology bank and language processing module.
Description
Technical field
The invention belongs to information technology field more particularly to a kind of pluggable components.
Background technique
It exchanges and is handled in identification in multiple languages, usually to identify specific term therein and certain specific
Do not have the general term of conventional sense under context, to be screened in Language Processing to obtain correctly identification and processing
As a result.
In the prior art, the method for establishing a terminology bank is generallyd use, by whether depositing in judgement treated corpus
In the term for being included in terminology bank, to be searched and be replaced identification.However, the meaning due to specific term is varied, and
Performance results of the general term under specific different context also generally occur within variation, and this lookup identification process can not be using calculating
Machine realizes automatically because machine not after these term process of expectability the result is that, leads to above-mentioned lookup, identification process only
It can manually carry out, so as to cause inefficiency, practical function does not occur for specific term library preconfigured in this way.
Summary of the invention
To solve the above problems, the present invention devises a kind of pluggable component, it can be from different terminology bank and language
Processing module is used cooperatively, but is independent from each other component between the component and terminology bank and language processing module.Pass through
Decoupling type design, the component itself does not change with the function of terminology bank and language processing module and is changed, on the contrary, can be direct
It is used in conjunction with each other from different terminology bank and language processing module.
Technical scheme is as follows:
A kind of pluggable component, including input port, output port, terminology bank load port and Language Processing component connecting pin
Mouthful, which includes an AC automation module.
The input port, for multilingual corpus to be processed to be input to the pluggable component;
The pluggable component is by the AC automatic machine, through terminology bank load port load and the multilingual corpus pair
The terminology bank answered;And the first term process is carried out to the multilingual corpus;
The Language Processing component receives first term process as a result, and carrying out languages identification and conversion to it;
The AC automatic machine receives the languages that the Language Processing component executes and identifies with conversion as a result, and carrying out second to it
Term process;
The output port exports the second term process result.
Optionally, the multilingual corpus to be processed is corpus to be translated, and the Language Processing component includes multiple
Translation component, the languages identification and conversion include carrying out translation processing to corpus to be translated;
Optionally, first term process includes: the AC automatic machine according to the load and the multilingual corpus pair
The terminology bank answered searches the specific term for meeting qualifications in the corpus to be translated of input, by institute in corpus to be translated
It states and meets the specific terms of qualifications be substituted for cannot be by the special marking of the Language Processing component recognition;
Optionally, the second processing includes: to be identified with converting according to the languages as a result, AC automatic machine is based on the load
Terminology bank corresponding with the multilingual corpus, the languages are identified and are replaced with the special marking in the result converted
For target terms, thus output treated translation result.
The present invention abandoned pervious term chain refer to relationship confirmation method in, need from translation component using model into
It is that the chain of term in parallel corpora being referred to, relationship is indicated by model parameter to do with what translation component bundled by force when row translation
Method, component of the present invention and translation component are decouplings, independent mutually, be one can plug component, can with arbitrarily turn over
Engine docking is translated, the identification that term chain in parallel corpora refers to relationship is completed.
Detailed description of the invention
Fig. 1 is the terminology bank recognition methods of the prior art
Fig. 2 is pluggable component framework figure of the invention
Fig. 3 is the specific system diagram that pluggable component of the invention is used for translation process
Specific embodiment
Referring to Fig. 1, it is assumed that need to carry out Language Processing to some corpus " give me a magazine ", for example, translation.
The corpus is the article of a description gunbattle, and correct translation result should be " giving me a magazine ".
If not establishing terminology bank, in common Language Processing output result, has plenty of " giving me a magazine ", have plenty of
" giving me a periodical " has plenty of " giving me a text " ..., and as a result all inaccurate, next interpreter can only search manually,
And (because computer can not determine which word of the lookup) cannot be searched automatically, interpreter finds out in translation result manually one by one
" magazine, periodical, text, ordnance " etc., then it is replaced with into " magazine " manually.The process efficiency and its low.As it can be seen that even if pre-
It is first configured with specific term library, which can not also realize automatically, as shown in Figure 1.
It is pluggable component framework figure of the invention, including input port (1), output port (2), terminology bank referring to Fig. 2
Load port (3) and Language Processing component connectivity port (4), the pluggable module body include an AC automation module.
The input port (1), for multilingual corpus to be processed to be input to the pluggable component;
The pluggable component is by the AC automatic machine, through terminology bank load port load and the multilingual corpus pair
The terminology bank answered;And the first term process is carried out to the multilingual corpus;
The Language Processing component receives first term process as a result, and carrying out languages identification and conversion to it;
The AC automatic machine receives the languages that the Language Processing component executes and identifies with conversion as a result, and carrying out second to it
Term process;
The output port exports the second term process result.
Referring to Fig. 3, for the corpus input to be translated described in Fig. 1, process is as follows:
The multilingual corpus (corpus to be translated: languages input in source as shown in the figure) to be processed of input port input is " give me a
Magazine ", the terminology bank pre-established include " magazine-magazine ", which can give according to semantic context
To limit;
The term of terminology bank is loaded into AC automatic machine.
It is matched next, being treated using AC automatic machine and translating corpus, the source term matched is substituted for special marking.
Specifically, " magazine " belongs to AC automatic machine according to the source term on Auto-matching, matching process can be examined
Consider semantic context to be realized by the automatic plane mechanism of AC.
The source term is replaced with certain special marking in the present invention due to cannot correctly be translated, such as
" % ... % ", as long as this label cannot be translated array identification;
At this point, " give me a % ... % " enters translation array as flag sequence, the translated corpus of return is " to me
One % ... % ";
(4) terminology bank is reused, special marking is replaced back to the target terms for needing to replace.
At this point, " giving me a % ... % " can export correct translation result: giving me a magazine (target as shown in the figure
Languages output).
According to above-mentioned steps as can be seen that since by terminology bank, in conjunction with AC automatic machine, the above process can automate reality
Now and guarantee that result is accurate, greatly improves efficiency.
In addition, the present invention is designed as pluggable decoupling form, can be convenient and a variety of terminology banks, multilingual processing group
Part, such as translation engine are used cooperatively.
Pluggable component of the invention carries out multi-mode string character match using AC automatic machine, ensure that in source statement
With term in dictionary, especially when terminology bank is especially big, matches and occur which term in this sentence, entire time cost can
Descend to original logarithm rank.
Claims (6)
1. a kind of pluggable component, including input port, output port, terminology bank load port and the connection of Language Processing component
Port, it is characterised in that:
The pluggable module body includes an AC automation module;
The input port, for multilingual corpus to be processed to be input to the pluggable component;
The pluggable component is by the AC automatic machine, through terminology bank load port load and the multilingual corpus pair
The terminology bank answered;And the first term process is carried out to the multilingual corpus;
The Language Processing component receives first term process as a result, and carrying out languages identification and conversion to it;
The AC automatic machine receives the languages that the Language Processing component executes and identifies with conversion as a result, and carrying out second to it
Term process;
The output port exports the second term process result.
2. pluggable component as described in claim 1, wherein the multilingual corpus to be processed is corpus to be translated,
The Language Processing component includes multiple translation components, and the languages identification and conversion include translating to corpus to be translated
Processing.
3. pluggable component as claimed in claim 2, first term process includes: that the AC automatic machine adds according to described
The terminology bank corresponding with the multilingual corpus carried, searches the special art for meeting qualifications in the corpus to be translated of input
Language, the specific term that qualifications are met described in corpus to be translated is substituted for cannot be by the Language Processing component recognition
Special marking.
4. pluggable component as claimed in claim 3, wherein the second processing includes: to identify and turn according to the languages
It is changing as a result, AC automatic machine is based on load terminology bank corresponding with the multilingual corpus, by languages identification with
The special marking in the result of conversion replaces with target terms, thus output treated translation result.
5. pluggable component as claimed in claim 2, wherein the pluggable component and Language Processing component are can to decouple
's.
6. pluggable component as claimed in claim 2, wherein the pluggable component and terminology bank can decouple.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810809257.XA CN109062909A (en) | 2018-07-23 | 2018-07-23 | A kind of pluggable component |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810809257.XA CN109062909A (en) | 2018-07-23 | 2018-07-23 | A kind of pluggable component |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109062909A true CN109062909A (en) | 2018-12-21 |
Family
ID=64836087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810809257.XA Pending CN109062909A (en) | 2018-07-23 | 2018-07-23 | A kind of pluggable component |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109062909A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102693309A (en) * | 2011-05-26 | 2012-09-26 | 中国科学院计算技术研究所 | Candidate phrase querying method and aided translation system for computer aided translation |
CN104781791A (en) * | 2011-12-05 | 2015-07-15 | 持续电信解决方案公司 | Universal pluggable cloud disaster recovery system |
CN106250375A (en) * | 2016-08-09 | 2016-12-21 | 北京百度网讯科技有限公司 | Translation processing method and device |
CN108009160A (en) * | 2017-11-30 | 2018-05-08 | 北京金山安全软件有限公司 | Corpus translation method and device containing named entity, electronic equipment and storage medium |
CN108228574A (en) * | 2017-12-07 | 2018-06-29 | 科大讯飞股份有限公司 | Text translation processing method and device |
-
2018
- 2018-07-23 CN CN201810809257.XA patent/CN109062909A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102693309A (en) * | 2011-05-26 | 2012-09-26 | 中国科学院计算技术研究所 | Candidate phrase querying method and aided translation system for computer aided translation |
CN104781791A (en) * | 2011-12-05 | 2015-07-15 | 持续电信解决方案公司 | Universal pluggable cloud disaster recovery system |
CN106250375A (en) * | 2016-08-09 | 2016-12-21 | 北京百度网讯科技有限公司 | Translation processing method and device |
CN108009160A (en) * | 2017-11-30 | 2018-05-08 | 北京金山安全软件有限公司 | Corpus translation method and device containing named entity, electronic equipment and storage medium |
CN108228574A (en) * | 2017-12-07 | 2018-06-29 | 科大讯飞股份有限公司 | Text translation processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111753099B (en) | Method and system for enhancing relevance of archive entity based on knowledge graph | |
CN109299480B (en) | Context-based term translation method and device | |
CN109726298B (en) | Knowledge graph construction method, system, terminal and medium suitable for scientific and technical literature | |
CN101667176A (en) | Method and system for counting machine translation based on phrases | |
CN112035599B (en) | Query method and device based on vertical search, computer equipment and storage medium | |
CN108959276A (en) | A kind of term discovery method and its system for translation | |
US20220138240A1 (en) | Source code retrieval | |
CN112464662A (en) | Medical phrase matching method, device, equipment and storage medium | |
CN112528681A (en) | Cross-language retrieval and model training method, device, equipment and storage medium | |
CN113836314B (en) | Knowledge graph construction method, device, equipment and storage medium | |
CN110263127A (en) | Text search method and device is carried out based on user query word | |
CN115630843A (en) | Contract clause automatic checking method and system | |
CN111091009B (en) | Document association auditing method based on semantic analysis | |
CN114398968B (en) | Method and device for labeling similar customer-obtaining files based on file similarity | |
Shanmugalingam et al. | Language identification at word level in Sinhala-English code-mixed social media text | |
CN108984540A (en) | A kind of method and auxiliary translation system of supplementary translation | |
CN109062909A (en) | A kind of pluggable component | |
CN112380848A (en) | Text generation method, device, equipment and storage medium | |
CN113139558A (en) | Method and apparatus for determining a multi-level classification label for an article | |
CN114090620B (en) | Query request processing method and device | |
CN114461665B (en) | Method, apparatus and computer program product for generating a statement transformation model | |
WO2012091539A1 (en) | A semantic similarity matching system and a method thereof | |
CN114139543A (en) | Entity link corpus labeling method and device | |
WO2016059505A1 (en) | A system and a method for recognition of aerospace parts in unstructured text | |
CN109165297B (en) | Universal entity linking device and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181221 |