CN109871548A - A kind of patent document interpretation method - Google Patents
A kind of patent document interpretation method Download PDFInfo
- Publication number
- CN109871548A CN109871548A CN201711250768.4A CN201711250768A CN109871548A CN 109871548 A CN109871548 A CN 109871548A CN 201711250768 A CN201711250768 A CN 201711250768A CN 109871548 A CN109871548 A CN 109871548A
- Authority
- CN
- China
- Prior art keywords
- statement
- translation
- patent document
- translated
- interlude
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a kind of patent document interpretation methods, reference statement library is formed by extracting the similar patent document of one or more identical to the language form of patent document to be translated and its one or more patent family members identical with object language, according to wait translate the high frequency words extracted in patent document and/or high frequency phrases as term original text, one or more reference term translations are extracted from reference statement library provides the user with the term translation for generating object language, according to the statement interlude of patent document to be translated, one or more reference translation statement interludes relevant to statement interlude to be translated are automatically extracted from reference statement library and/or the translation statement interlude that reference translation statement interlude provides the user with generation object language is automatically generated based on syntactic analysis and reference statement library, make user can be fast in the case where no relevant item translation memory library Fast easily autonomous creation reference statement library, improves the rigorous normalization of translation efficiency and translation, reduces error rate.
Description
Technical field
The present invention relates to translation technology more particularly to a kind of patent document interpretation methods.
Background technique
With popularizing for Internet, the computer disposal of natural language becomes obtains the important of knowledge from internet
Means.For example, in the fields such as international exchange and scientific research and education, people need to translate foreign country's spoken and written languages, and the past, this was
The stage that great master of linguistics displays one's talent.It with the rapid development of computer technology and constantly improve, machine translation obtains increasingly
It is widely applied.Machine translation has own big advantage, and if translation speed is fast, memory capability is strong, while can also reduce and turn over
It is translated into etc., but the disadvantage is that translation quality is also far from satisfying the demand of people at present, how to develop the machine of high quality
Device interpretation method becomes urgent problem to be solved.
The characteristics of having its own due to the document in different technologies field, current general machine translation are difficult to use in all
Technical field.It is both technological document and legal document to patent document, accuracy and preciseness is required in translation, it is right
Its translation quality has higher requirement, thus patent translator mostly uses greatly the mode of manual translation every words.Translator
Although can be translated based on application of the same clan, there is still a need for first retrieving to related patents, related text is selected by hand,
This mode not only low efficiency, and error probability is higher.The characteristics of patent document is write due to it, there are more in document
The usual phrase of patent, in application number, applicant, abstract, claims, specification, technical field, background technique, invention
Appearance, specific embodiment etc.;There is also more fixation and common sentence form of presentation in patent document, such as described
..., the present invention provides one kind ... method, provided by the invention ... at least there is following advantage, the claim of this patent
Range is ... etc.;There is also a large amount of high frequency vocabulary in patent application document, the term meeting such as occurred in detail in the claims
Repeatedly occur in summary of the invention, specific embodiment.Due to the These characteristics of patent document, so that patent document can in translation
The information of reference is more.
These characteristics based on patent document, the Chinese patent application of Publication No. CN103488627A, disclose one kind
Full piece patent document interpretation method and translation system.The translation system that the patent provides is obtained by carrying out morphological analysis to full text
At object language and wrong identification and amendment are carried out to translation result to phrase, then by the phrase translation;It is right in full text translation
The noun phrase is directly replaced using revised result;After the completion of translation, by original text title Sequential output.Although the patent energy
It enough obtains in patent document and commonly uses complicated noun phrase, reduce the analysis time of the sentence containing common complicated noun phrase, mention
High translation speed, still, there is also following defects for the patent:
(1) range of the patent file collection selected by is wide, and specific aim is not strong, in this, as the reference translation of phrase to be translated,
It is easy to cause the mistranslation of technical term;
(2) to the translation of full text, the translation of only phrase have passed through amendment, and sentence is the translation provided using the patent
System is directly translated, not high to the accuracy of sentence translation;
(3) patent is only applicable to the translation of patent application document, to notice, notice turn text etc. other with patent phase
The document of pass is simultaneously not suitable for;
(4) this method does not fully consider in patent document and has in the Translation characteristics of proper noun and patent documentation data library
There is more the characteristics of can refer to information.
Most of all, the interpretation method that the patent provides, which mainly passes through, extracts patent file concentration and phrase to be translated
Corresponding high frequency phrases are not tied as reference translation, method of this merely selection high frequency phrases translation as reference translation
The semanteme of phrase in the patent literature is closed, it is similar with common translation software, it is be easy to cause the mistranslation of phrase, is especially easy to make
At the mistranslation of technical term, so that the translation of patent document loses preciseness and professional feature.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of patent document interpretation method, the methods
The following steps are included: A, extract the similar patent document of identical to the language form of patent document to be translated one or more and its with
The identical one or more patent family member storages of target language type are into reference statement library, B, basis patent document to be translated
The high frequency vocabulary and/or phrase of middle extraction extract one or more reference terms from reference statement library and translate as term original text
Text provides the user with the term translation for generating object language, C, the statement interlude to be translated according to patent document to be translated, from reference language
Sentence library automatically extracts one or more reference translation statement interludes relevant to statement interlude to be translated and/or based on syntactic analysis and reference
Statement library automatically generates reference translation statement interlude and provides the user with the translation statement interlude for generating object language, D, will patent be translated
The statement interlude of file and its corresponding translation statement interlude are stored into reference statement library so that translation hereafter uses;It is wherein described
Similar patent document, patent family member, patent document to be translated and translation statement interlude split into ginseng by separator of end of the sentence point number
Written comments on the work, etc of public of officials syntagma is stored in reference statement library, and the patent document to be translated is split as using point number in end of the sentence point number and sentence as separator
Statement interlude to be translated;The priority that similar reference statement section is different in reference statement library is assigned, is mentioned according to priority orders to user
Statement interlude for reference;The similar patent document and its patent family member by from wait translate the keyword extracted in patent document and its
Weight is extracted from one or more patent databases, to be torn open according to the similar patent document of weight imparting and its patent family member
The different priority of the reference statement section being divided into;The keyword includes applicant, inventor, denomination of invention, claims
And summary info, the weight of the keyword can be by user preset or adjustment.
Specific embodiment
Technical solution of the present invention is detailed further below, but protection scope of the present invention is not limited to following institute
It states.
Embodiment:
A kind of patent interpretation method, the described method comprises the following steps: extract one or more similar patent documents and its
One or more patent family member storages are into reference statement library;According to wait translate the high frequency words extracted in patent document and/or
High frequency phrases extract one or more reference term translations from reference statement library and are supplied to user, with life as term original text
At the term translation of object language.
According to the statement interlude to be translated of patent document to be translated, relevant to statement interlude to be translated one is automatically extracted from reference statement library
A or multiple reference translation statement interludes, and/or reference translation statement interlude is automatically generated based on syntactic analysis and reference statement library and is mentioned
Supply user is supplied to user, to generate the translation statement interlude of object language.
By the statement interlude of patent document to be translated and its storage of corresponding translation statement interlude into reference statement library for hereafter
Translation use;Wherein the language form of file to be translated is original language, and the language form to be translated into of this document is object language.
The language form of the similar documents is identical as original language.The language form of the patent family member is identical as object language.
The similar patent document, patent family member, patent document to be translated and translation statement interlude are point with end of the sentence point number
Reference statement section is split into every symbol to be stored in reference statement library;
The patent document to be translated is split as statement interlude to be translated using point number in end of the sentence point number and sentence as separator.
By being split in different ways to reference statement section and statement interlude to be translated, so that in reference statement library
Syntagma length is greater than the length of statement interlude to be translated, and in retrieval, can comprehensively retrieve as far as possible comprising statement interlude to be translated
Reference statement section can help understanding based on context to select suitable translation as much as possible.
The original language can be Chinese, Japanese, English, Korean, German, French, Spanish, Italian, Thai language
Or Russian etc..The object language can be Chinese, Japanese, English, Korean, German, French, Spanish, Italian, Thailand
Text or Russian etc..
Wherein the end of the sentence point number includes one or more of fullstop, question mark and exclamation.Point number includes funny in the sentence
Number, one or more of pause mark, branch and colon.
Preferably, the end of the sentence point number is fullstop, and point number is comma and/or branch in the sentence.
Preferably, assign the different priority of similar reference statement section in reference statement library, according to priority orders to
Family provides reference statement section.
Since there may be similar sentences in multiple similar patent documents, may retrieve in translation from difference
Multiple similar reference statement sections of file, since identical sentence might have the different meanings in different contexts,
User is difficult to therefrom select optimal statement interlude, according to priority to providing a user suitable reference statement section in sequence
It can save the time, improve translation efficiency.
The priority can wait translating according to the reference statement Duan Zhongyu of original language in statement interlude the number of identical vocabulary and
Sequence determines;And/or the priority is determined by the priority of the place file of reference statement section.It is got over patent document to be translated
Similar, priority is higher, and the reference statement section from this document is just higher with the identical property of statement interlude to be translated, therefore can be with
Assign its higher priority.It is preferentially supplied to the high reference statement section of User Priority in translation, such as according to priority
Sequence is shown to user from top to bottom or from bottom to top on a display screen, and user is allow first to see the high reference statement of priority
Section.
Preferably, the similar patent document and its patent family member by from wait translate the keyword extracted in patent document and
Its weight is extracted from one or more patent databases, to assign similar patent document and its patent family member according to the weight
The different priority of the reference statement section split into;When being difficult to find that the patent family member wait translate patent document in some cases,
Similar patent document can be found by keyword, examined as passed through keyword or bibliographical particulars information in content etc.
Rope.
Preferably, the keyword includes applicant, inventor, denomination of invention, claims and summary info.It is described
The weight of keyword can be by user preset or adjustment.
For example the reference statement section, from the divisional applied file of the original language of the patent document to be translated, the two is said
Bright book content is identical, and claims forms part is variant.So divisional applied file and its object language of the original language
Patent family member will have highest priority, the reference statement section from this document also has highest priority.
The reference statement section being separated by the patent document to be translated and its translation statement interlude is assigned highest preferential
Grade.
Preferably, the reference translation sentence of the reference original text statement interlude and its patent family member of the similar patent document
Section is stored in reference statement library in a manner of corresponding one by one.
The reference term translation extracted is supplied to user in a manner of corresponding reference term original text control.
The reference translation statement interlude extracted is supplied in a manner of the corresponding control with reference to original text statement interlude
User.
Being supplied to user in a corresponding way family can be used can understand according to the information with reference to original text, select
Or edit out the reference translation being more suitable for.
Preferably, the user selects and/or modifies to one or more of relational language translations, to generate art
Language translation.The reference translation statement interlude extracted with the user to one or more is selected and/or is modified, and is translated with generating
Literary statement interlude;And/or the user selects and/or modifies to the reference translation statement interlude automatically generated, to generate translation
Statement interlude.
Preferably, the term translation of generation and its corresponding term original text are stored in reference statement in a manner of corresponding one by one
It is applied automatically in translation in library, and below.The translation statement interlude of generation and its corresponding original text statement interlude to correspond to one by one
Mode be stored in reference statement library, and applied automatically in the translation below.Or the translation statement interlude and its right generated
After the original text statement interlude storage answered, provided in the form of with reference to original text statement interlude and reference translation statement interlude in translation below
To user.
Preferably, the high frequency words of extraction and/or high frequency phrases automatically remove non-term vocabulary and/or phrase therein
To generate term original text;And/or the high frequency words and/or high frequency phrases extracted are customized by the user removal non-term therein
Vocabulary and/or phrase are to generate term original text.
Preferably, the statement interlude to be translated, with reference to the term original text in original text statement interlude, reference translation statement interlude and translation
It is shown in a manner of being different from other content with term translation.
Preferably, the side that other content is different from reference to place different from statement interlude to be translated in original text statement interlude
Formula is shown.
Preferably, the patent document to be translated, related patents file and patent family member can be the application text of patent
Sheet, disclosure, authorization text or notification of examiner's opinion relevant to application.
Patent interpretation method of the invention, separates by different ways in statement interlude to be measured and reference database
Statement interlude it is more complete comprising content so that reference statement section is longer than statement interlude to be measured, on the one hand can comprehensively retrieve as far as possible
To the reference statement section comprising statement interlude to be measured;On the other hand it allows users to be become apparent from according to the content of reference statement section comprehensively
The meaning of the understanding statement interlude to be measured in different context make translation can be more to selection or edit out more accurate translation
It is convenient accurate to add.It is supplied to user according to priority orders additionally by the similar different priority of reference statement section is assigned,
Improve translation efficiency.In addition, the reference statement library due to the method for the present invention user can carry out free establishment again as needed,
The flexibility of translation is improved, even if user is made not accumulate relevant translation project out when translating new Patent project
Database, can also quickly and easily be established by customized with the higher reference statement library of the file to be translated matching degree, from
And job costs have been saved, improve efficiency.
A specific embodiment of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
Limitations on the scope of the patent of the present invention therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention
Protect range.
Claims (1)
1. a kind of patent document interpretation method, which is characterized in that the described method comprises the following steps:
A, the similar patent document of identical to the language form of patent document to be translated one or more and its and object language are extracted
The identical one or more patent family members of type are stored into reference statement library,
B, it according to wait translate the high frequency vocabulary extracted in patent document and/or phrase as term original text, is mentioned from reference statement library
One or more reference term translations are taken to provide the user with the term translation for generating object language,
C, according to the statement interlude to be translated of patent document to be translated, one relevant to statement interlude to be translated is automatically extracted from reference statement library
Or it multiple reference translation statement interludes and/or reference translation statement interlude is automatically generated based on syntactic analysis and reference statement library is supplied to
User to generate the translation statement interlude of object language,
D, by the statement interlude of patent document to be translated and its storage of corresponding translation statement interlude into reference statement library for hereafter
Translation uses;
Wherein the similar patent document, patent family member, patent document to be translated and translation statement interlude are point with end of the sentence point number
Reference statement section is split into every symbol to be stored in reference statement library,
The patent document to be translated is split as statement interlude to be translated using point number in end of the sentence point number and sentence as separator;
The priority that similar reference statement section is different in reference statement library is assigned, is provided a user according to priority orders with reference to language
Syntagma;
The similar patent document and its patent family member pass through from wait translate the keyword extracted in patent document and its weight from one
It is extracted in a or multiple patent databases, with the ginseng for assigning similar patent document according to the weight and its patent family member is split into
The different priority of written comments on the work, etc of public of officials syntagma;
The keyword includes applicant, inventor, denomination of invention, claims and summary info, the power of the keyword
Weight can be by user preset or adjustment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711250768.4A CN109871548A (en) | 2017-12-01 | 2017-12-01 | A kind of patent document interpretation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711250768.4A CN109871548A (en) | 2017-12-01 | 2017-12-01 | A kind of patent document interpretation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109871548A true CN109871548A (en) | 2019-06-11 |
Family
ID=66914631
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711250768.4A Pending CN109871548A (en) | 2017-12-01 | 2017-12-01 | A kind of patent document interpretation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109871548A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110728156A (en) * | 2019-12-19 | 2020-01-24 | 北京百度网讯科技有限公司 | Translation method and device, electronic equipment and readable storage medium |
CN110807338A (en) * | 2019-11-08 | 2020-02-18 | 北京中献电子技术开发有限公司 | English-Chinese machine translation term consistency self-correcting system and method |
CN112818711A (en) * | 2021-02-23 | 2021-05-18 | 湖北省地震局(中国地震局地震研究所) | Machine translation method for translating multi-word specialized terms in scientific and technological literature |
-
2017
- 2017-12-01 CN CN201711250768.4A patent/CN109871548A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807338A (en) * | 2019-11-08 | 2020-02-18 | 北京中献电子技术开发有限公司 | English-Chinese machine translation term consistency self-correcting system and method |
CN110807338B (en) * | 2019-11-08 | 2022-03-04 | 北京中献电子技术开发有限公司 | English-Chinese machine translation term consistency self-correcting system and method |
CN110728156A (en) * | 2019-12-19 | 2020-01-24 | 北京百度网讯科技有限公司 | Translation method and device, electronic equipment and readable storage medium |
US11574135B2 (en) | 2019-12-19 | 2023-02-07 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method, apparatus, electronic device and readable storage medium for translation |
CN112818711A (en) * | 2021-02-23 | 2021-05-18 | 湖北省地震局(中国地震局地震研究所) | Machine translation method for translating multi-word specialized terms in scientific and technological literature |
CN112818711B (en) * | 2021-02-23 | 2023-11-03 | 湖北省地震局(中国地震局地震研究所) | Machine translation method for translating ambiguous technical terms in scientific literature |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gutierrez-Vasques et al. | Axolotl: a web accessible parallel corpus for Spanish-Nahuatl | |
Fantinuoli et al. | Creating and using multilingual corpora in translation studies | |
Costa et al. | A comparative user evaluation of terminology management tools for interpreters | |
CN109871548A (en) | A kind of patent document interpretation method | |
CN109871546A (en) | A kind of patent document translation system | |
Héja | The Role of Parallel Corpora in Bilingual Lexicography. | |
Crasborn et al. | From corpus to lexicon: the creation of ID-glosses for the Corpus NGT | |
Zaghouani et al. | A pilot propbank annotation for quranic arabic | |
Généreux et al. | A large Portuguese corpus on-line: cleaning and preprocessing | |
Litkowski | The preposition project corpora | |
Frankenberg-Garcia | Compiling and using a parallel corpus for research in translation | |
Kopřivová et al. | From dictionary to corpus | |
Griesel et al. | Navigating challenges of multilingual resource development for under-resourced languages: The case of the African Wordnet project | |
Kim et al. | Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia. | |
Rosmorduc | Computational linguistics in egyptology | |
Rimkutė et al. | Corpus of contemporary Lithuanian language–the standardised way | |
Lew | Dictionaries and technology | |
Meurant et al. | Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation | |
Parvez | Named entity recognition from bengali newspaper data | |
Meurant et al. | Modelling a parallel corpus of french and french belgian sign language | |
Aldezabal et al. | Basque e-lexicographic resources: linguistic basis, development, and future perspectives | |
Jettka et al. | Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation Data | |
Beal et al. | Taming digital voices and texts: Models and methods for handling unconventional diachronic corpora | |
Lugli | Smart lexicography for low-resource languages: lessons learned from Sanskrit and Tibetan | |
Abdumanapovna | The role of sketch engine in multiple types of corpora |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190611 |
|
WD01 | Invention patent application deemed withdrawn after publication |