CN109871548A - A kind of patent document interpretation method - Google Patents

A kind of patent document interpretation method Download PDF

Info

Publication number
CN109871548A
CN109871548A CN201711250768.4A CN201711250768A CN109871548A CN 109871548 A CN109871548 A CN 109871548A CN 201711250768 A CN201711250768 A CN 201711250768A CN 109871548 A CN109871548 A CN 109871548A
Authority
CN
China
Prior art keywords
statement
translation
patent document
translated
interlude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711250768.4A
Other languages
Chinese (zh)
Inventor
蒋洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Luyuan Enterprise Management Consulting Co Ltd
Original Assignee
Sichuan Luyuan Enterprise Management Consulting Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Luyuan Enterprise Management Consulting Co Ltd filed Critical Sichuan Luyuan Enterprise Management Consulting Co Ltd
Priority to CN201711250768.4A priority Critical patent/CN109871548A/en
Publication of CN109871548A publication Critical patent/CN109871548A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of patent document interpretation methods, reference statement library is formed by extracting the similar patent document of one or more identical to the language form of patent document to be translated and its one or more patent family members identical with object language, according to wait translate the high frequency words extracted in patent document and/or high frequency phrases as term original text, one or more reference term translations are extracted from reference statement library provides the user with the term translation for generating object language, according to the statement interlude of patent document to be translated, one or more reference translation statement interludes relevant to statement interlude to be translated are automatically extracted from reference statement library and/or the translation statement interlude that reference translation statement interlude provides the user with generation object language is automatically generated based on syntactic analysis and reference statement library, make user can be fast in the case where no relevant item translation memory library Fast easily autonomous creation reference statement library, improves the rigorous normalization of translation efficiency and translation, reduces error rate.

Description

A kind of patent document interpretation method
Technical field
The present invention relates to translation technology more particularly to a kind of patent document interpretation methods.
Background technique
With popularizing for Internet, the computer disposal of natural language becomes obtains the important of knowledge from internet Means.For example, in the fields such as international exchange and scientific research and education, people need to translate foreign country's spoken and written languages, and the past, this was The stage that great master of linguistics displays one's talent.It with the rapid development of computer technology and constantly improve, machine translation obtains increasingly It is widely applied.Machine translation has own big advantage, and if translation speed is fast, memory capability is strong, while can also reduce and turn over It is translated into etc., but the disadvantage is that translation quality is also far from satisfying the demand of people at present, how to develop the machine of high quality Device interpretation method becomes urgent problem to be solved.
The characteristics of having its own due to the document in different technologies field, current general machine translation are difficult to use in all Technical field.It is both technological document and legal document to patent document, accuracy and preciseness is required in translation, it is right Its translation quality has higher requirement, thus patent translator mostly uses greatly the mode of manual translation every words.Translator Although can be translated based on application of the same clan, there is still a need for first retrieving to related patents, related text is selected by hand, This mode not only low efficiency, and error probability is higher.The characteristics of patent document is write due to it, there are more in document The usual phrase of patent, in application number, applicant, abstract, claims, specification, technical field, background technique, invention Appearance, specific embodiment etc.;There is also more fixation and common sentence form of presentation in patent document, such as described ..., the present invention provides one kind ... method, provided by the invention ... at least there is following advantage, the claim of this patent Range is ... etc.;There is also a large amount of high frequency vocabulary in patent application document, the term meeting such as occurred in detail in the claims Repeatedly occur in summary of the invention, specific embodiment.Due to the These characteristics of patent document, so that patent document can in translation The information of reference is more.
These characteristics based on patent document, the Chinese patent application of Publication No. CN103488627A, disclose one kind Full piece patent document interpretation method and translation system.The translation system that the patent provides is obtained by carrying out morphological analysis to full text At object language and wrong identification and amendment are carried out to translation result to phrase, then by the phrase translation;It is right in full text translation The noun phrase is directly replaced using revised result;After the completion of translation, by original text title Sequential output.Although the patent energy It enough obtains in patent document and commonly uses complicated noun phrase, reduce the analysis time of the sentence containing common complicated noun phrase, mention High translation speed, still, there is also following defects for the patent:
(1) range of the patent file collection selected by is wide, and specific aim is not strong, in this, as the reference translation of phrase to be translated, It is easy to cause the mistranslation of technical term;
(2) to the translation of full text, the translation of only phrase have passed through amendment, and sentence is the translation provided using the patent System is directly translated, not high to the accuracy of sentence translation;
(3) patent is only applicable to the translation of patent application document, to notice, notice turn text etc. other with patent phase The document of pass is simultaneously not suitable for;
(4) this method does not fully consider in patent document and has in the Translation characteristics of proper noun and patent documentation data library There is more the characteristics of can refer to information.
Most of all, the interpretation method that the patent provides, which mainly passes through, extracts patent file concentration and phrase to be translated Corresponding high frequency phrases are not tied as reference translation, method of this merely selection high frequency phrases translation as reference translation The semanteme of phrase in the patent literature is closed, it is similar with common translation software, it is be easy to cause the mistranslation of phrase, is especially easy to make At the mistranslation of technical term, so that the translation of patent document loses preciseness and professional feature.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of patent document interpretation method, the methods The following steps are included: A, extract the similar patent document of identical to the language form of patent document to be translated one or more and its with The identical one or more patent family member storages of target language type are into reference statement library, B, basis patent document to be translated The high frequency vocabulary and/or phrase of middle extraction extract one or more reference terms from reference statement library and translate as term original text Text provides the user with the term translation for generating object language, C, the statement interlude to be translated according to patent document to be translated, from reference language Sentence library automatically extracts one or more reference translation statement interludes relevant to statement interlude to be translated and/or based on syntactic analysis and reference Statement library automatically generates reference translation statement interlude and provides the user with the translation statement interlude for generating object language, D, will patent be translated The statement interlude of file and its corresponding translation statement interlude are stored into reference statement library so that translation hereafter uses;It is wherein described Similar patent document, patent family member, patent document to be translated and translation statement interlude split into ginseng by separator of end of the sentence point number Written comments on the work, etc of public of officials syntagma is stored in reference statement library, and the patent document to be translated is split as using point number in end of the sentence point number and sentence as separator Statement interlude to be translated;The priority that similar reference statement section is different in reference statement library is assigned, is mentioned according to priority orders to user Statement interlude for reference;The similar patent document and its patent family member by from wait translate the keyword extracted in patent document and its Weight is extracted from one or more patent databases, to be torn open according to the similar patent document of weight imparting and its patent family member The different priority of the reference statement section being divided into;The keyword includes applicant, inventor, denomination of invention, claims And summary info, the weight of the keyword can be by user preset or adjustment.
Specific embodiment
Technical solution of the present invention is detailed further below, but protection scope of the present invention is not limited to following institute It states.
Embodiment:
A kind of patent interpretation method, the described method comprises the following steps: extract one or more similar patent documents and its One or more patent family member storages are into reference statement library;According to wait translate the high frequency words extracted in patent document and/or High frequency phrases extract one or more reference term translations from reference statement library and are supplied to user, with life as term original text At the term translation of object language.
According to the statement interlude to be translated of patent document to be translated, relevant to statement interlude to be translated one is automatically extracted from reference statement library A or multiple reference translation statement interludes, and/or reference translation statement interlude is automatically generated based on syntactic analysis and reference statement library and is mentioned Supply user is supplied to user, to generate the translation statement interlude of object language.
By the statement interlude of patent document to be translated and its storage of corresponding translation statement interlude into reference statement library for hereafter Translation use;Wherein the language form of file to be translated is original language, and the language form to be translated into of this document is object language. The language form of the similar documents is identical as original language.The language form of the patent family member is identical as object language.
The similar patent document, patent family member, patent document to be translated and translation statement interlude are point with end of the sentence point number Reference statement section is split into every symbol to be stored in reference statement library;
The patent document to be translated is split as statement interlude to be translated using point number in end of the sentence point number and sentence as separator.
By being split in different ways to reference statement section and statement interlude to be translated, so that in reference statement library Syntagma length is greater than the length of statement interlude to be translated, and in retrieval, can comprehensively retrieve as far as possible comprising statement interlude to be translated Reference statement section can help understanding based on context to select suitable translation as much as possible.
The original language can be Chinese, Japanese, English, Korean, German, French, Spanish, Italian, Thai language Or Russian etc..The object language can be Chinese, Japanese, English, Korean, German, French, Spanish, Italian, Thailand Text or Russian etc..
Wherein the end of the sentence point number includes one or more of fullstop, question mark and exclamation.Point number includes funny in the sentence Number, one or more of pause mark, branch and colon.
Preferably, the end of the sentence point number is fullstop, and point number is comma and/or branch in the sentence.
Preferably, assign the different priority of similar reference statement section in reference statement library, according to priority orders to Family provides reference statement section.
Since there may be similar sentences in multiple similar patent documents, may retrieve in translation from difference Multiple similar reference statement sections of file, since identical sentence might have the different meanings in different contexts, User is difficult to therefrom select optimal statement interlude, according to priority to providing a user suitable reference statement section in sequence It can save the time, improve translation efficiency.
The priority can wait translating according to the reference statement Duan Zhongyu of original language in statement interlude the number of identical vocabulary and Sequence determines;And/or the priority is determined by the priority of the place file of reference statement section.It is got over patent document to be translated Similar, priority is higher, and the reference statement section from this document is just higher with the identical property of statement interlude to be translated, therefore can be with Assign its higher priority.It is preferentially supplied to the high reference statement section of User Priority in translation, such as according to priority Sequence is shown to user from top to bottom or from bottom to top on a display screen, and user is allow first to see the high reference statement of priority Section.
Preferably, the similar patent document and its patent family member by from wait translate the keyword extracted in patent document and Its weight is extracted from one or more patent databases, to assign similar patent document and its patent family member according to the weight The different priority of the reference statement section split into;When being difficult to find that the patent family member wait translate patent document in some cases, Similar patent document can be found by keyword, examined as passed through keyword or bibliographical particulars information in content etc. Rope.
Preferably, the keyword includes applicant, inventor, denomination of invention, claims and summary info.It is described The weight of keyword can be by user preset or adjustment.
For example the reference statement section, from the divisional applied file of the original language of the patent document to be translated, the two is said Bright book content is identical, and claims forms part is variant.So divisional applied file and its object language of the original language Patent family member will have highest priority, the reference statement section from this document also has highest priority.
The reference statement section being separated by the patent document to be translated and its translation statement interlude is assigned highest preferential Grade.
Preferably, the reference translation sentence of the reference original text statement interlude and its patent family member of the similar patent document Section is stored in reference statement library in a manner of corresponding one by one.
The reference term translation extracted is supplied to user in a manner of corresponding reference term original text control.
The reference translation statement interlude extracted is supplied in a manner of the corresponding control with reference to original text statement interlude User.
Being supplied to user in a corresponding way family can be used can understand according to the information with reference to original text, select Or edit out the reference translation being more suitable for.
Preferably, the user selects and/or modifies to one or more of relational language translations, to generate art Language translation.The reference translation statement interlude extracted with the user to one or more is selected and/or is modified, and is translated with generating Literary statement interlude;And/or the user selects and/or modifies to the reference translation statement interlude automatically generated, to generate translation Statement interlude.
Preferably, the term translation of generation and its corresponding term original text are stored in reference statement in a manner of corresponding one by one It is applied automatically in translation in library, and below.The translation statement interlude of generation and its corresponding original text statement interlude to correspond to one by one Mode be stored in reference statement library, and applied automatically in the translation below.Or the translation statement interlude and its right generated After the original text statement interlude storage answered, provided in the form of with reference to original text statement interlude and reference translation statement interlude in translation below To user.
Preferably, the high frequency words of extraction and/or high frequency phrases automatically remove non-term vocabulary and/or phrase therein To generate term original text;And/or the high frequency words and/or high frequency phrases extracted are customized by the user removal non-term therein Vocabulary and/or phrase are to generate term original text.
Preferably, the statement interlude to be translated, with reference to the term original text in original text statement interlude, reference translation statement interlude and translation It is shown in a manner of being different from other content with term translation.
Preferably, the side that other content is different from reference to place different from statement interlude to be translated in original text statement interlude Formula is shown.
Preferably, the patent document to be translated, related patents file and patent family member can be the application text of patent Sheet, disclosure, authorization text or notification of examiner's opinion relevant to application.
Patent interpretation method of the invention, separates by different ways in statement interlude to be measured and reference database Statement interlude it is more complete comprising content so that reference statement section is longer than statement interlude to be measured, on the one hand can comprehensively retrieve as far as possible To the reference statement section comprising statement interlude to be measured;On the other hand it allows users to be become apparent from according to the content of reference statement section comprehensively The meaning of the understanding statement interlude to be measured in different context make translation can be more to selection or edit out more accurate translation It is convenient accurate to add.It is supplied to user according to priority orders additionally by the similar different priority of reference statement section is assigned, Improve translation efficiency.In addition, the reference statement library due to the method for the present invention user can carry out free establishment again as needed, The flexibility of translation is improved, even if user is made not accumulate relevant translation project out when translating new Patent project Database, can also quickly and easily be established by customized with the higher reference statement library of the file to be translated matching degree, from And job costs have been saved, improve efficiency.
A specific embodiment of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously Limitations on the scope of the patent of the present invention therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention Protect range.

Claims (1)

1. a kind of patent document interpretation method, which is characterized in that the described method comprises the following steps:
A, the similar patent document of identical to the language form of patent document to be translated one or more and its and object language are extracted The identical one or more patent family members of type are stored into reference statement library,
B, it according to wait translate the high frequency vocabulary extracted in patent document and/or phrase as term original text, is mentioned from reference statement library One or more reference term translations are taken to provide the user with the term translation for generating object language,
C, according to the statement interlude to be translated of patent document to be translated, one relevant to statement interlude to be translated is automatically extracted from reference statement library Or it multiple reference translation statement interludes and/or reference translation statement interlude is automatically generated based on syntactic analysis and reference statement library is supplied to User to generate the translation statement interlude of object language,
D, by the statement interlude of patent document to be translated and its storage of corresponding translation statement interlude into reference statement library for hereafter Translation uses;
Wherein the similar patent document, patent family member, patent document to be translated and translation statement interlude are point with end of the sentence point number Reference statement section is split into every symbol to be stored in reference statement library,
The patent document to be translated is split as statement interlude to be translated using point number in end of the sentence point number and sentence as separator;
The priority that similar reference statement section is different in reference statement library is assigned, is provided a user according to priority orders with reference to language Syntagma;
The similar patent document and its patent family member pass through from wait translate the keyword extracted in patent document and its weight from one It is extracted in a or multiple patent databases, with the ginseng for assigning similar patent document according to the weight and its patent family member is split into The different priority of written comments on the work, etc of public of officials syntagma;
The keyword includes applicant, inventor, denomination of invention, claims and summary info, the power of the keyword Weight can be by user preset or adjustment.
CN201711250768.4A 2017-12-01 2017-12-01 A kind of patent document interpretation method Pending CN109871548A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711250768.4A CN109871548A (en) 2017-12-01 2017-12-01 A kind of patent document interpretation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711250768.4A CN109871548A (en) 2017-12-01 2017-12-01 A kind of patent document interpretation method

Publications (1)

Publication Number Publication Date
CN109871548A true CN109871548A (en) 2019-06-11

Family

ID=66914631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711250768.4A Pending CN109871548A (en) 2017-12-01 2017-12-01 A kind of patent document interpretation method

Country Status (1)

Country Link
CN (1) CN109871548A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728156A (en) * 2019-12-19 2020-01-24 北京百度网讯科技有限公司 Translation method and device, electronic equipment and readable storage medium
CN110807338A (en) * 2019-11-08 2020-02-18 北京中献电子技术开发有限公司 English-Chinese machine translation term consistency self-correcting system and method
CN112818711A (en) * 2021-02-23 2021-05-18 湖北省地震局(中国地震局地震研究所) Machine translation method for translating multi-word specialized terms in scientific and technological literature

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807338A (en) * 2019-11-08 2020-02-18 北京中献电子技术开发有限公司 English-Chinese machine translation term consistency self-correcting system and method
CN110807338B (en) * 2019-11-08 2022-03-04 北京中献电子技术开发有限公司 English-Chinese machine translation term consistency self-correcting system and method
CN110728156A (en) * 2019-12-19 2020-01-24 北京百度网讯科技有限公司 Translation method and device, electronic equipment and readable storage medium
US11574135B2 (en) 2019-12-19 2023-02-07 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus, electronic device and readable storage medium for translation
CN112818711A (en) * 2021-02-23 2021-05-18 湖北省地震局(中国地震局地震研究所) Machine translation method for translating multi-word specialized terms in scientific and technological literature
CN112818711B (en) * 2021-02-23 2023-11-03 湖北省地震局(中国地震局地震研究所) Machine translation method for translating ambiguous technical terms in scientific literature

Similar Documents

Publication Publication Date Title
Gutierrez-Vasques et al. Axolotl: a web accessible parallel corpus for Spanish-Nahuatl
Fantinuoli et al. Creating and using multilingual corpora in translation studies
Costa et al. A comparative user evaluation of terminology management tools for interpreters
CN109871548A (en) A kind of patent document interpretation method
CN109871546A (en) A kind of patent document translation system
Héja The Role of Parallel Corpora in Bilingual Lexicography.
Crasborn et al. From corpus to lexicon: the creation of ID-glosses for the Corpus NGT
Zaghouani et al. A pilot propbank annotation for quranic arabic
Généreux et al. A large Portuguese corpus on-line: cleaning and preprocessing
Litkowski The preposition project corpora
Frankenberg-Garcia Compiling and using a parallel corpus for research in translation
Kopřivová et al. From dictionary to corpus
Griesel et al. Navigating challenges of multilingual resource development for under-resourced languages: The case of the African Wordnet project
Kim et al. Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia.
Rosmorduc Computational linguistics in egyptology
Rimkutė et al. Corpus of contemporary Lithuanian language–the standardised way
Lew Dictionaries and technology
Meurant et al. Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation
Parvez Named entity recognition from bengali newspaper data
Meurant et al. Modelling a parallel corpus of french and french belgian sign language
Aldezabal et al. Basque e-lexicographic resources: linguistic basis, development, and future perspectives
Jettka et al. Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation Data
Beal et al. Taming digital voices and texts: Models and methods for handling unconventional diachronic corpora
Lugli Smart lexicography for low-resource languages: lessons learned from Sanskrit and Tibetan
Abdumanapovna The role of sketch engine in multiple types of corpora

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190611

WD01 Invention patent application deemed withdrawn after publication