CN101539906A - System and method for automatically analyzing patent text - Google Patents

System and method for automatically analyzing patent text Download PDF

Info

Publication number
CN101539906A
CN101539906A CN200810085054A CN200810085054A CN101539906A CN 101539906 A CN101539906 A CN 101539906A CN 200810085054 A CN200810085054 A CN 200810085054A CN 200810085054 A CN200810085054 A CN 200810085054A CN 101539906 A CN101539906 A CN 101539906A
Authority
CN
China
Prior art keywords
knowledge
processor
language
ontology
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200810085054A
Other languages
Chinese (zh)
Inventor
张国明
Original Assignee
亿维讯软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 亿维讯软件(北京)有限公司 filed Critical 亿维讯软件(北京)有限公司
Priority to CN200810085054A priority Critical patent/CN101539906A/en
Publication of CN101539906A publication Critical patent/CN101539906A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a system for automatically analyzing a patent text, which comprises an expert knowledge processor, an ontology processor, a language knowledge base, an expert knowledge base and an ontology knowledge base; the working relation between the expert knowledge processor and the ontology processor is the parallel relation; and the expert knowledge base and the ontology knowledge base also have the parallel relation. The invention also provides a method for automatically analyzing the patent text; with the language knowledge base, the expert knowledge processor is utilized to carry out extraction and structuralized expression on patent full-text data of a patent database to generate the expert knowledge base and automatically update the expert knowledge base; and with the language knowledge base, the ontology processor is utilized to extract ontology from the patent full-text data of the patent database and identify the ontology relation to generate the ontology knowledge base, and automatically update the ontology knowledge base.

Description

A kind of system and method for automatically analyzing patent text
Technical field
The present invention relates to a kind of system and method that patent text (the particularly open text of application for a patent for invention and mandate) is analyzed automatically, can be used in and improve the user inquiring effect.
Background technology
The invention that Patent Law is alleged is meant the new technical scheme that product, method or its improvement are proposed.Owing to have certain legal document characteristic, patent documentation embodies formal Specification, the rigorous language feature of language, and its tediously long length, complicated style greatly reduce the intelligibility and the knowledge sharing usefulness of patent.Utilize the natural language technology that patent is handled, can play the effect that improves the patent service efficiency, promotes the patent effective utilization.
The form of patent text and Writing method are relatively unified and fixing, and term is standard comparatively also.Often comprise some fixedly sentence patterns in the patent documentation, these sentence pattern templates are fit to the automatic processing of machine.And the standardization of patent term makes that carrying out Knowledge Discovery in patent becomes possibility.
Existing patent text analytical technology comprises: patent text translation, patent information extraction, patent classification and cluster, patent automatic abstract, patent generation, patent valve estimating and raising patent readability etc.Above technology many places do not have ripe commercial product to produce in the experimental phase as yet at present.
Chinese patent notification number CN99813079, denomination of invention discloses a kind of computer based software systems and method for the application of " the document semantic analysis with knowledge generative capacity is selected ", be used for natural language request in process user input semantically, subject-action-object (SAO) structure with identification and storage language, adopt this structure to search for this locality as keyword/phrase and based on the database of WWW, so that download candidate's natural language document, the candidate documents text is treated to candidate documents SAO structure semantically, and only selects and store its SAO structure to comprise relevant documentation with the coupling of the request SAO structure of being stored.Further feature comprises the relation of analysis between relevant documentation SAO structure, and generate according to this relation and can produce new knowledge concepts and thought, and produce and show the natural language summary according to relevant documentation SAO structure for the new SAO structure that is shown to the user., the document SAO representation of its proposition represents that help improving the document precision ratio and can utilize SAO to generate the document summary automatically, its weak point is that matching method makes recall ratio to guarantee though having simplified document.
Chinese patent application number is 200410078337.0, denomination of invention discloses a kind of in the semantic processes module for the application of " method of using ontology and user inquiring treatment technology to deal with problems ", knowledge/data are represented and handled based on the ontology method, thus a kind of system, the method and computer program of technical solution problem.The basic element of character of semantic processes module comprises a semantic knowledge-base, an ontology knowledge base, and/or an expert knowledge library.Described method comprises a storage user search formula structural description or semi-structured description, non-structured retrieval type is carried out a kind of formal semantic expressiveness formula that semantic analysis forms retrieval type, formal semantic retrieval formula is carried out semantic extension, retrieval type after the expansion is used for searching relevant solution at expert knowledge library, and according to semantic relation the solution that finds is classified.Though described system can realize that to the parsing of user inquiring request and query expansion the Query Result that provides can satisfy user's demand to large extent.But still there is weak point in it: described expert knowledge library, ontology knowledge base are as the core calculations resource, if its structure relies on manual type, will be unusual complicated and hard to tackle, comprise flood tide work, and administering and maintaining also is a big problem.
Summary of the invention
The system and method that the purpose of this invention is to provide a kind of automatically analyzing patent text, described system and method is intended to utilize natural language processing technique that the full patent texts data are handled, expert knowledge library, the required data knowledge of ontology knowledge base are provided, reduce the acquisition cost and the maintenance cost of expert knowledge library, ontology knowledge base as far as possible.
The present invention proposes a kind of system that patent text (referring to patent of invention especially) is analyzed automatically, mainly comprise a language processing system, the basic element of character of this system comprises a language knowledge base 1, expert knowledge library 2, ontology knowledge base 3, expertise processor 10, this body processor 11.It is expert knowledge library 2, ontology knowledge base 3 that the present invention can obtain two big specific knowledge storehouses based on patent data, thereby, realize the full patent texts in the patent database 8 is handled for the technical matters that solves (but being not limited to) inventive problem or user provides the support of knowledge aspect.
Described language knowledge base 1 can provide language analysis and its formal semantic expressiveness, i.e. the technical matters settling mode that is embodied by " Verb (verb)-Parameter (parameter)-Object (object) (VPO) " of a user search formula.Described language knowledge base 1 can comprise, but the rule that is not limited to analyze, the lemmatization dictionary, logic of language, classification with the noun phrase, can provide the language analysis of carrying out patent text required word knowledge and language construction knowledge, and can provide the user search request pairing formal semantic expressiveness.The form of patent text and Writing method are relatively unified and fixing, and term is standard comparatively also.Often comprise some fixedly sentence patterns in the patent text, as " the objective of the invention is X ", " the described X of claim N is characterized in that Y ", wherein X, Y can be any word or sentence, and N is any number combination.These sentence pattern templates are fit to the automatic processing of machine, are the important component parts that constitutes language knowledge base 1.
Described expert knowledge library 2 is meant the solution knowledge base for the technical solution problem, and it derives from many text documents, is mainly derived from patent data, generates after expertise processor 10 is handled.Solution in the expert knowledge library 2 can be expressed as SVPO (subject term-verb-parameter-object) form, and wherein S is a subject term, or perhaps the solution of the defined technical functionality of vpo.
Described ontology knowledge base 3 comprises certain knowledge of world around, represents with the many words (notion and verb) and the semantic relation of these words of different kens, for example: synonymy, race relation (also being hierarchical relational), incidence relation.
Described expertise processor 10, this body processor 11 are all the ingredient of language processor system, and its work relationship is a coordination.
Described expertise processor 10 is a kind of extraction patent core contents, and then sets up the device of structurized expert knowledge library 2, and expert knowledge library 2 is as the carrier of technical matters solution, and using for the knowledge of application layer provides data resource to support.Described expertise processor 10 comprises pretreater, is used to carry out morphology identification and sentence and splits; The morphological processing device is used to mark out part of speech; Syntactic processor is used to discern syntactic structure; Semantic processor is used to mark out the represented semanteme of each main syntactic structure, thereby obtains marking the patent text of complex language information; The natural language compositor is used to generate a structurized knowledge entry, it is imported to expert knowledge library, and foundation/renewal is based on the semantic indexing of SVPO.The function of expertise processor 10 is that the full patent texts data are extracted and structured representation, thereby obtains required expert knowledge library 2.
The course of work of described expertise processor 10 can be expressed as follows: for one piece of patent text in the patent database 8, under the guidance of language knowledge base 1, through pretreater 12, morphological processing device 13, syntactic processor 14, the semantic processor 15 in the expertise processor 10, obtain marking the patent text of complex language information, and then, by natural language compositor 16, generate required solution knowledge base, import to expert knowledge library 2, and foundation/renewal is based on the semantic indexing of SVPO.
Described body processor 11 is to concern between a kind of automatic identification ontologies and body, and realizes dynamically updating the device of ontology knowledge base 3, and ontology knowledge base 3 provides support for the semantic extension of application layer and knowledge organization.Described body processor 11 comprises pretreater, is used to carry out morphology identification and sentence and splits; The body recognizer is used to extract body; Relationship identifier is used to discern the body relation; The body renovator is used for the ontology knowledge base is upgraded automatically.The function of this body processor 11 is from full patent texts extracting data body, identification body relation, and ontology knowledge base 3 is upgraded automatically.
The course of work of described body processor 11 can be expressed as follows: for one piece of patent text in the patent database 8, under the guidance of language knowledge base 1, through pretreater 17, body recognizer 18, the relationship identifier 19 in this body processor 11, obtain concerning between body (notion and verb) that the text comprises and the body in the text, via body renovator 20, body is imported ontology knowledge base 3.Body renovator 20 will be realized detection and the location of obtaining body in the ontology knowledge base.
Described patent database 8 can be the irrelevant databases of languages, stores the patent text of some.It can be the full patent texts database, also can be the patent claims database.Aspect languages, both can be English patent, also can be Chinese patent.
The present invention proposes a kind of method that patent text (referring to patent of invention especially) is analyzed automatically, comprising:
By language knowledge base, utilize the expertise processor that the full patent texts data in the patent database are extracted and structured representation, generate expert knowledge library, and expert knowledge library is upgraded automatically;
By language knowledge base, utilize full patent texts extracting data body, the identification body relation of this body processor from patent database, generate the ontology knowledge storehouse, and the ontology knowledge storehouse is upgraded automatically.
The described expert knowledge library step of obtaining comprises: pretreater carries out morphology identification and sentence splits; The morphological processing device marks out part of speech; Syntactic processor identification syntactic structure; Semantic processor marks out the represented semanteme of each main syntactic structure, thereby obtains marking the patent text of complex language information; The natural language compositor generates a structurized knowledge entry, and it is imported to expert knowledge library, and sets up or the renewal semantic indexing.Described semantic indexing is based on subject term-verb-parameter-object (SVPO) form.Solution in the described expert knowledge library is expressed as subject term-verb-parameter-object (SVPO) form.
The described ontology knowledge storehouse step of obtaining comprises: pretreater carries out morphology identification and sentence splits; The body recognizer extracts body; Relationship identifier identification body relation; The body renovator upgrades automatically to the ontology knowledge base.Described body renovator can also be realized detection and the location of obtaining body in the ontology knowledge base.
Described language knowledge base comprises the rule of analysis at least, the lemmatization dictionary, the classification of logic of language and noun phrase, can provide the language analysis of carrying out patent text required word knowledge and language construction knowledge, and can provide the user search request pairing formal semantic expressiveness.
Described patent database is and the irrelevant database of languages, stores the patent text of some.Be full patent texts database or patent claims database.
Use technical scheme of the present invention, can realize:
1) to the automatic extraction of patent text, the auxiliary expert knowledge library (solution) that generates;
2) discern body and the technical term that occurs in the patent automatically, determine the relationship type between body and the term, and realize dynamically updating the ontology knowledge base.
3) based on 1) expert knowledge library, 2 set up) the ontology knowledge base that obtained, can provide support for realizing important application such as intelligent solution search.
Description of drawings
Fig. 1 represents according to one embodiment of present invention, the module work relationship figure of language processor system;
Fig. 2 represents according to one embodiment of present invention, an example fragment of expert knowledge library.
Fig. 3 represents according to one embodiment of present invention, an example fragment of ontology knowledge base;
Fig. 4 represents that a kind of typical case's application according to the invention process achievement is the main process flow diagram of knowledge retrieval.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
One embodiment of the present of invention provide a kind of knowledge acquisition system and method based on patent database.In one embodiment, the language processor system is provided as required expert knowledge library 2 and the ontology knowledge base 3 of search technique of finding accurate and complete solution and adopting.
Fig. 1 is according to one embodiment of present invention, provides to realize required expert knowledge library 2 and the ontology knowledge base 3 of accurate and complete search technique.As shown in Figure 1, expertise processor 10 receives from one piece of patent text in the patent database 8, by language knowledge base 1, with pretreater 12 it is carried out morphology identification and sentence fractionation, then mark out part of speech with morphology processor 13, then use syntactic processor 14 identification syntactic structures, based on this, use semantic processor 15 to mark out the represented semanteme of each main syntactic structure, thereby obtain marking the patent text of complex language information, and then, by natural language compositor 16, generate a structurized knowledge entry, promptly required solution knowledge, it is imported to expert knowledge library 2, and foundation/renewal is based on the semantic indexing of SVPO.
In one embodiment, patent database 8 is stored the patent text of some.Every piece of patent text all possesses specific structure, is example with the United States Patent (USP), comprises that " Title ", " Abstract ", " IssueDate ", " Claims " etc. must content and printed words.In addition, patent database 8 of the present invention requires every piece of patent text to have higher representativeness and differs from one another on affiliated technical field and/or solution.
In one embodiment, language knowledge base can comprise, the rule of analysis, and the lemmatization dictionary, the classification of logic of language and noun phrase can provide the language analysis of carrying out patent text required word knowledge and language construction knowledge.With the language construction knowledge of patent text, be description, for example: " the objective of the invention is X " to peculiar logic of language of patent and expression way as analytic target, " the described X of claim N; it is characterized in that Y ", wherein X, Y can be any word or sentence, N is any number combination.Language knowledge base is that the patent text processing provides support.
Shown in Figure 2 is a fragment/example of expert knowledge library 2, has embodied the structure and the content of expert knowledge library 2.The generation of a knowledge entry is the processing procedure of expertise processor.
Each knowledge entry in the expert knowledge library 2 is all represented a solution.Studies show that most of inventions can be expressed as the form of a kind of being called " technical functionality ", VPO form just, it has represented the formal characteristic of a problem.As the semantic meaning representation to this knowledge entry, each solution all is the sentence expression with a natural language, comprises four fields, corresponding the basic function of " SVPO ".A solution of S problem of representation, problem has VPO to represent, and wherein V represents verb, and P represents parameter, the O indicated object.Knowledge entry as shown in Figure 2 " Calcium sulfateprevents absorption of fat ", its SVPO is expressed as:
SVPO:S(Calcium?sulfate)V(prevent)P(absorption)O(fat)。
Shown in Figure 3 is a fragment of ontology knowledge base 3, has embodied the structure and the content of ontology knowledge base 3.The ontology knowledge base can be the word hierarchical data base of different kens, in notion of this used " word " expression.Relation between the word of ontology knowledge base comprises three kinds, is respectively synonymy, race relation and incidence relation.
Semantic relation between that synonymy is meant in given context the expression identical meanings or two morphology structures, comprise direct synonym, as " clear ", " rectify ", " purify ", " refine " etc., also comprise the sentence structure synonym, the different syntactic structure of identical for representing (or close) implication is as " dehydrate ", " decrease relative humidity " etc.
Race relation also claims parent relation/subclass relation, shows two speech of parent notion/subclass notion of fixed one group of notion or the semantic relation between two morphology structures.As: " water-〉channel ", " water-〉bay ", " physical thing-〉water " etc.
Incidence relation refers to have each other two speech of incidence relation or the semantic relation between two morphology structures.Two speech or morphology structure with incidence relation have identical parent relation, are the subclass notions under the same parent notion, as " channel<-bay ".
In one embodiment, in this body processor 11, body that extracts from one piece of patent text and relation will be submitted to body renovator 20, be realized new body and relation, will be had the contrast between body, the relation by this module, thereby finish the renewal of ontology knowledge base.Particularly, if from one piece of patent text, get access to two bodies " territorial waters " and " waterfall ", whether body renovator 20 will be present in ontology library to two bodies is judged, and be located in the ontology library, can know separately hypernym, synonym behind the location, hypernym as " territorial waters ", " waterfall " all is " water ", and the synonym of " waterfall " is " falls ".
The resulting achievement of one embodiment of the present of invention, promptly described expert knowledge library 2 and ontology knowledge base 3 are applied to the process flow diagram of knowledge retrieval, as shown in Figure 4.
Fig. 4 represents that a kind of typical case's application according to the invention process achievement is the main process flow diagram of knowledge retrieval, be the 26S Proteasome Structure and Function block diagram that is used to solve the language processing module of inventive problem and user's technical matters, a kind of typical case who embodies expert knowledge library 2, ontology knowledge base 3 uses.
In one embodiment, language knowledge base can comprise, the rule of analyzing, the lemmatization dictionary, logic of language, with the classification of noun phrase, can provide the language analysis of carrying out patent text required word knowledge and language construction knowledge, and can provide the user search request pairing formal semantic expressiveness.Under the help of language knowledge base 1, can provide that the pairing formal semantic expressiveness-verb of user search request (verb)-parameter (parameter)-object (object) (vpo); Under the help of ontology knowledge base 3, can finish parsing and semantic extension, and the solution that retrieves is classified the user search formula; Under the help of expert knowledge library 2, can determine the solution of specific retrieval type.In one embodiment, the output of language processing module at user's request shown in Figure 4 is these solutions of arranging according to semanteme.
Be the processing procedure to the user search formula shown in Figure 4 below:
Retrieval type for example: How to measure thickness of ice
Structured form: V (measure) P (thickness) O (ice)
A user search formula by analysis can be the VPO structure, as above example.This structure can be submitted to the retrieval enlargement module, uses ontological hierarchy to finish semantic extension, so that retrieve the solution relevant with problem as much as possible.
The retrieval type of VPO uses any variable mode to expand.Correspondingly to carry out following expansion:
Synonym expansion (verb, parameter and object are expanded);
Kind expansion (be expansion up and down, only object expanded); And/or
Related expansion (only object being expanded)
During the synonym expansion, each speech of user search formula is all substituted by synonym, as above example:
Structured form: V (measure) P (thickness) O (ice)
Output (synonym expansion):
V(measure,detect,gage,gauge,log,measure?out,meter,quantify,register)
P (not having synonym)
O(water?ice)
The kind expansion is that the hierarchical relational of the term in the retrieval type with term substituted.The expansion of two kinds of kinds is arranged, and a kind of is bottom-up (by special case to general), as
Structured form: V (measure) P (thickness) O (ice)
Output (it is bottom-up that kind is expanded, and only object is carried out the father concerns expansion):
O(dimension)
Another kind of expansion is top-down (by general to special case), as
Structured form: V (measure) P (thickness) O (ice)
Output (it is bottom-up that kind is expanded, and only object carried out the subrelation expansion):
O(half?thickness,half-value?thickness,half-thickness)
Kind retrieval can retrieve the solution of special case more, more general or more heterogeneous pass.
Incidence relation is that term is substituted with incidence relation.As:
Retrieval type for example: How to measure thickness of ice
Structured form: V (measure) P (thickness) O (ice)
Output (only object O being carried out the association expansion)
O(creaminess,soupiness,critical?thickness,……)
Target to solution retrieval is to search solution according to the retrieval type after the expansion in expert knowledge library 2, and enumerates solution according to the result who searches, and search engine is the VPO field in the expert knowledge library 2 and the retrieval type after the expansion relatively.The corresponding relation of these fields will retrieve relevant solution.Because these results' character need be classified to it according to semantic relation, the result is:
(1) accurate scheme: the initial VO/VPO that forms of the VO/VPO field of these solutions and retrieval type fits like a glove.
For example: V (heat) O (water)
Solution: S (coil) V (increase) P (temperature) O (water)
(2) special case scheme: at least one in the VO/VPO field of these solutions is a special case of relevant field in the retrieval type.
For example: V (measure) P (thickness) O (ice)
Solution: S (ultrasonic probe) V (measure) P (thickness) O (frost)
(3) general scheme:
For example: V (neutralize) O (hydrochloric acid)
Solution: S (alkali) V (neutralize) O (acid)
(4) analogy scheme:
For example: V (neutralize) O (hydrochloric acid)
Solution: S (alkali) V (neutralize) O (nitric acid)
In the above example, the solution thinking of S representative " descriptor " or problem.
A kind of special circumstances when embodiment of the present invention is the invention process, protection scope of the present invention is not limited only to this.
Processing of the present invention, calculating, judgement or the like all are to a kind of operation of data and conversion.
Embodiments of the invention comprise finishes these apparatus operating.
Although described some embodiments of the present invention above, it should be understood that these embodiment are object lessons more of the invention process, should not the restriction of protection domain of the present invention.Protection scope of the present invention should not limited by the description of instructions, and should be limited by claims and their equivalent.The change that those skilled in the art do the embodiment of the invention according to above-mentioned description and explanation, all should protection scope of the present invention within.

Claims (19)

1. the system of an automatically analyzing patent text is characterized in that, comprising:
The expertise processor is used for the full patent texts data of patent database are extracted and structured representation, generates expert knowledge library, and expert knowledge library is upgraded automatically;
This body processor is used for full patent texts extracting data body, identification body relation from patent database, generates the ontology knowledge storehouse, and the ontology knowledge storehouse is upgraded automatically;
Language knowledge base, being used to provides the language analysis of a user search formula and its formal semantic expressiveness, assists the work of expertise processor and this body processor;
Expert knowledge library is the solution knowledge base of technical solution problem, derives from many text documents, is mainly derived from patent data, generates after the expertise processor processing;
The ontology knowledge storehouse comprises certain knowledge of world around, represents with the many words of different kens and the semantic relation of these words, generates after the body processor processing;
The work relationship of described expertise processor, this body processor is a coordination, and described expert knowledge library and ontology knowledge storehouse also are coordination.
2. system according to claim 1 is characterized in that, described expertise processor comprises:
Pretreater is used to carry out morphology identification and sentence and splits;
The morphological processing device is used to mark out part of speech;
Syntactic processor is used to discern syntactic structure;
Semantic processor is used to mark out the represented semanteme of each main syntactic structure, thereby obtains marking the patent text of complex language information;
The natural language compositor is used to generate a structurized knowledge entry, and it is imported to expert knowledge library, and sets up or the renewal semantic indexing.
3. system according to claim 2 is characterized in that, described semantic indexing is based on subject term-verb-parameter-object (SVPO) form.
4. system according to claim 1 is characterized in that, described body processor comprises:
Pretreater is used to carry out morphology identification and sentence and splits;
The body recognizer is used to extract body;
Relationship identifier is used to discern the body relation;
The body renovator is used for body is imported the ontology knowledge base, and the ontology knowledge base is upgraded automatically.
5. system according to claim 1 is characterized in that, described body renovator can also be realized detection and the location of obtaining body in the ontology knowledge base.
6. system according to claim 1 is characterized in that the semantic relation of described word comprises synonymy, race relation and incidence relation at least.
7. system according to claim 1 is characterized in that, the solution in the described expert knowledge library is expressed as subject term-verb-parameter-object (SVPO) form.
8. system according to claim 1, it is characterized in that, described language knowledge base comprises the rule of analysis at least, the lemmatization dictionary, logic of language, with the classification of noun phrase, can provide the language analysis of carrying out patent text required word knowledge and language construction knowledge, and can provide the user search request pairing formal semantic expressiveness.
9. system according to claim 1 is characterized in that, described patent database is and the irrelevant database of languages, stores the patent text of some.
10. system according to claim 1 is characterized in that, described patent database is full patent texts database or patent claims database.
11. the method for an automatically analyzing patent text is characterized in that, may further comprise the steps:
By language knowledge base, utilize the expertise processor that the full patent texts data in the patent database are extracted and structured representation, generate expert knowledge library, and expert knowledge library is upgraded automatically;
By language knowledge base, utilize full patent texts extracting data body, the identification body relation of this body processor from patent database, generate the ontology knowledge storehouse, and the ontology knowledge storehouse is upgraded automatically.
12. method according to claim 11 is characterized in that, the described expert knowledge library step of obtaining comprises:
Pretreater carries out morphology identification and sentence splits;
The morphological processing device marks out part of speech;
Syntactic processor identification syntactic structure;
Semantic processor marks out the represented semanteme of each main syntactic structure, thereby obtains marking the patent text of complex language information;
The natural language compositor generates a structurized knowledge entry, and it is imported to expert knowledge library, and sets up or the renewal semantic indexing.
13. method according to claim 12 is characterized in that, described semantic indexing is based on subject term-verb-parameter-object (SVPO) form.
14. method according to claim 11 is characterized in that, the described ontology knowledge storehouse step of obtaining comprises:
Pretreater carries out morphology identification and sentence splits;
The body recognizer extracts body;
Relationship identifier identification body relation;
The body renovator upgrades automatically to the ontology knowledge base.
15. method according to claim 11 is characterized in that, described body renovator can also be realized detection and the location of obtaining body in the ontology knowledge base.
16. method according to claim 11 is characterized in that, the solution in the described expert knowledge library is expressed as subject term-verb-parameter-object (SVPO) form.
17. method according to claim 11, it is characterized in that, described language knowledge base comprises the rule of analysis at least, the lemmatization dictionary, logic of language, with the classification of noun phrase, can provide the language analysis of carrying out patent text required word knowledge and language construction knowledge, and can provide the user search request pairing formal semantic expressiveness.
18. method according to claim 11 is characterized in that, described patent database is and the irrelevant database of languages, stores the patent text of some.
19. method according to claim 11 is characterized in that, described patent database is full patent texts database or patent claims database.
CN200810085054A 2008-03-17 2008-03-17 System and method for automatically analyzing patent text Pending CN101539906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810085054A CN101539906A (en) 2008-03-17 2008-03-17 System and method for automatically analyzing patent text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810085054A CN101539906A (en) 2008-03-17 2008-03-17 System and method for automatically analyzing patent text

Publications (1)

Publication Number Publication Date
CN101539906A true CN101539906A (en) 2009-09-23

Family

ID=41123097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810085054A Pending CN101539906A (en) 2008-03-17 2008-03-17 System and method for automatically analyzing patent text

Country Status (1)

Country Link
CN (1) CN101539906A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013153556A2 (en) * 2012-03-16 2013-10-17 N Sringeri OMPRAKASH Document analysis system
WO2014044167A1 (en) * 2012-09-18 2014-03-27 Orcastras Technology International Co., Ltd. Method and computer for indexing and searching structures
CN103914486A (en) * 2013-01-08 2014-07-09 邓寅生 Document search and display system
CN104199809A (en) * 2014-04-24 2014-12-10 江苏大学 Semantic representation method for patent text vectors
CN104361507A (en) * 2014-11-20 2015-02-18 携程计算机技术(上海)有限公司 Commodity recommending method and system
CN105468933A (en) * 2014-08-28 2016-04-06 深圳先进技术研究院 Biological data analysis method and system
CN107644051A (en) * 2016-07-20 2018-01-30 百度(美国)有限责任公司 System and method for the packet of similar entity
CN108763445A (en) * 2018-05-25 2018-11-06 厦门智融合科技有限公司 Construction method, device, computer equipment and the storage medium in patent knowledge library

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013153556A2 (en) * 2012-03-16 2013-10-17 N Sringeri OMPRAKASH Document analysis system
WO2013153556A3 (en) * 2012-03-16 2013-12-05 N Sringeri OMPRAKASH Document analysis system
WO2014044167A1 (en) * 2012-09-18 2014-03-27 Orcastras Technology International Co., Ltd. Method and computer for indexing and searching structures
CN103914486A (en) * 2013-01-08 2014-07-09 邓寅生 Document search and display system
CN103914486B (en) * 2013-01-08 2017-02-15 邓寅生 Document search and display system
CN104199809A (en) * 2014-04-24 2014-12-10 江苏大学 Semantic representation method for patent text vectors
CN105468933A (en) * 2014-08-28 2016-04-06 深圳先进技术研究院 Biological data analysis method and system
CN105468933B (en) * 2014-08-28 2018-06-15 深圳先进技术研究院 biological data analysis method and system
CN104361507A (en) * 2014-11-20 2015-02-18 携程计算机技术(上海)有限公司 Commodity recommending method and system
CN107644051A (en) * 2016-07-20 2018-01-30 百度(美国)有限责任公司 System and method for the packet of similar entity
CN108763445A (en) * 2018-05-25 2018-11-06 厦门智融合科技有限公司 Construction method, device, computer equipment and the storage medium in patent knowledge library
WO2019223793A1 (en) * 2018-05-25 2019-11-28 厦门智融合科技有限公司 Patent knowledge base construction method, apparatus, computer device, and storage medium
US11714787B2 (en) 2018-05-25 2023-08-01 ZFusion Technology Co., Ltd. Xiamen Construction method, device, computing device, and storage medium for constructing patent knowledge database

Similar Documents

Publication Publication Date Title
Zhang et al. Entity linking leveraging automatically generated annotation
CN104361127B (en) The multilingual quick constructive method of question and answer interface based on domain body and template logic
CN101042692B (en) translation obtaining method and apparatus based on semantic forecast
CN100416570C (en) FAQ based Chinese natural language ask and answer method
KR101522049B1 (en) Coreference resolution in an ambiguity-sensitive natural language processing system
CN101539906A (en) System and method for automatically analyzing patent text
CN103324700B (en) Noumenon concept attribute learning method based on Web information
CN101510221A (en) Enquiry statement analytical method and system for information retrieval
Wang et al. Challenges in chinese knowledge graph construction
KR20100041482A (en) Apparatus and method for search of contents
CN115186050B (en) Method, system and related equipment for recommending selected questions based on natural language processing
Moldovan et al. Temporal context representation and reasoning
CN105760462A (en) Man-machine interaction method and device based on associated data query
CN105677725A (en) Preset parsing method for tourism vertical search engine
Garrido et al. GEO-NASS: A semantic tagging experience from geographical data on the media
Sateli et al. What's in this paper? Combining Rhetorical Entities with Linked Open Data for Semantic Literature Querying
Shrivastava et al. Morphology based natural language processing tools for indian languages
Kešelj et al. A SUFFIX SUBSUMPTION-BASED APPROACH TO BUILDING STEMMERS AND LEMMATIZERS FOR HIGHLY INFLECTIONAL LANGUAGES WITH SPARSE RESOURCES.
Tran et al. A model of vietnamese person named entity question answering system
TWM623980U (en) System of screening for text data relevance
Keyaki et al. Part-of-speech tagging for web search queries using a large-scale web corpus
Sang et al. Extraction of hypernymy information from text∗
Mahajani et al. Ranking-based sentence retrieval for text summarization
Mallat et al. Proposal of statistical method of semantic indexing for multilingual documents
Shi et al. Information extraction for computer science academic rankings system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20090923