CN1752966A - Method of solving problem using wikipedia and user inquiry treatment technology - Google Patents

Method of solving problem using wikipedia and user inquiry treatment technology Download PDF

Info

Publication number
CN1752966A
CN1752966A CN 200410078337 CN200410078337A CN1752966A CN 1752966 A CN1752966 A CN 1752966A CN 200410078337 CN200410078337 CN 200410078337 CN 200410078337 A CN200410078337 A CN 200410078337A CN 1752966 A CN1752966 A CN 1752966A
Authority
CN
China
Prior art keywords
methods according
formula
semantic
ontology
user search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200410078337
Other languages
Chinese (zh)
Other versions
CN100361126C (en
Inventor
张国明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
YIWEIXUN SCIENCE AND TECHNOLOGY Co Ltd BEIJING
Original Assignee
YIWEIXUN SCIENCE AND TECHNOLOGY Co Ltd BEIJING
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by YIWEIXUN SCIENCE AND TECHNOLOGY Co Ltd BEIJING filed Critical YIWEIXUN SCIENCE AND TECHNOLOGY Co Ltd BEIJING
Priority to CNB2004100783370A priority Critical patent/CN100361126C/en
Publication of CN1752966A publication Critical patent/CN1752966A/en
Application granted granted Critical
Publication of CN100361126C publication Critical patent/CN100361126C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention describes a system, method and computer program which uses ontology method as basis, and can be used for representing and processing data/knowledge so as to solve technical problem in a sematic processing module. The basic components of said semantic processing module include a sematic knowledge base, an ontology knowledge base and/or an expert knowledge base. The solved problem can include a user's technical problem or an invention problem. Besides, said invention also provides the concrete steps of the invented method.

Description

The method of using ontology and user inquiring treatment technology to deal with problems
Technical field
The present invention relates to the automatic calculation to problem, more particularly, the semantic method of the use information and the representation of knowledge and processing is carried out problem solving.
Background technology
Solve user's technical matters and at first need the good technical support, promptly operationally obtain information and knowledge base, this can answer and how to deal with problems, and perhaps helps to provide the information that relates to problem solving, for example utilize the problem of other ken, perhaps the problem of other type in the same system.This can point out the direction of solving a problem for the user.Traditionally, the computer based information retrieval is finished by search engine.
In uncomplicated information retrieval system, finish retrieval by the keyword (by user's input) that comprises in the retrieve data library text.The feature of this retrieval is that degree of accuracy is low, recall ratio is low.The modern information retrieval system should provide the possibility that forms the natural language searching formula for the user, and promptly should there be the user interface of natural language in system.Then, retrieval type is carried out automatic language analysis, generate formal expression formula.Language analysis can be finished on the level of natural language different depth.This analysis in ideal conditions, should comprise the analysis of semantic hierarchies.Importantly not only to recognize the relation between the different elements in the retrieval type (the normally the abundantest element of information), and will recognize the relation between the corresponding element in element in the retrieval type and the external world or the certain ken.This just needs to use the semantic relation between the notion, for example as dictionary of explaining in knowledge description or ontology, to improve the performance of information retrieval system under different modes in various application.
Ontology is a kind of morphology structure of layering, and wherein the notion of word and contamination definition exists semantic relation each other.According to the described speech and the knowledge about the people of specific area and world around of attempting to reflect, ontology can be a specific area, also can be general.Because ontology has been represented a valuable and open-ended data acquisition, can use ontology to improve the accuracy of retrieval in information retrieval.
The information retrieval system that U.S. Pat 6675159B1 describes is used the document that comes index to collect based on ontological predicate structure.The described system of this patent can only return the document that mates these notions fully according to the user search formula.This system also has certain for ontological retrieval capability, and it can retrieve the phrase of logical organization from ontology.For example retrieval type is " Whatis the current situation of the stock market? " an attributes extraction device extracts direct attribute " current ", " situation ", " stock ", " market ", the attributes extraction device also can use the ontology that comprises concept hierarchy, and attribute " stock " is expanded to " finance ", " banks ", " brokerages ", " Wall Street " or the like.
U.S. Pat 5940821 described knowledge retrieval and search systems, with U.S. Pat 6460034B1 described relevant documentation retrieval and search system based on knowledge, use knowledge base to carry out the classification of identification of document subject matter speech (releasing descriptor) and document from the vocabulary of terms of document, described knowledge base stored have morphology, the association between the noun/kind of semanteme or usage association.Can under the help of knowledge base, retrieve the relevant documentation of a retrieval type by expansion retrieval word and subject term.Described system comprises fact knowledge library searching formula and conceptual knowledge base retrieval type.Fact knowledge library searching formula is determined the relevant subject term of a retrieval type and is the document of these subject terms classification.What form contrast is that the conceptual knowledge base retrieval type has been determined potential document by showing related specy and subject term.
The system of above-mentioned two patents comprises a language engine, a knowledge classification processor, a subject term vector processor and a morphology part.Described language engine comprises a lexical analyzer and a subject term analyzer, not only analyzes the morphology or the context viewpoint of described collection of document, also analyzes the attribute of every piece of document type and theme.Particularly, language engine has generated contextual tagging, thematic indicia and the type mark etc. of the every piece of document part as output.
The knowledge base of above-mentioned two patents is with the expanded set that generates the retrieval type word, and described expanded set is used to select the document that attaches.In order to use knowledge base to expand the word of retrieval type, related being used for of the rank of classification layer and knowledge base selected node in predetermined rule.In one embodiment, based on for example weighting, the word intensity of retrieval type is lowered, for example to morphology each point on for example, when expanding to a more generally kind or when related, the word weighting of retrieval type is lowered 50%.Select the weighting of retrieval type word greater than all nodes of one at last.And select all subcategorys and word under the node.
Yet above-mentioned two described systems of patent are mainly based on subject term vector method of identification.Described system needs the text in the searching database to carry out index according to special contextual tagging, subject term mark and type mark, based on the ontological word expansion document subsidiary according to the subject term vector search.
Ontology also is applied to data base management system (DBMS) traditionally.In International Patent Application WO 2003/030025A1, described data base management system (DBMS) is used the retrieval type integration problem under ontology solution semanteme different problem, semantic unmatched problem and the distributed resource.The solution of semantic different problems is the implications of formally specifying the word in each system that uses ontology (shared or private).Like this, the described system of this patent provides a distributed retrieval type solution for the network with a plurality of database resources.Described web help user retrieves and disposal data from a plurality of resources banks, and these resources banks can be for example SQL or XML database or the like.
Therefore, the ambiguity of ambiguous word has been eliminated by the described system of above-mentioned patent from different information resources retrieving information the time.
In U.S. Pat 2002/0107844A1, information generate and searching system in ontology as an instrument, set up the semantic expressiveness of sentence with the form help of concept map.In information retrieval process, the sentence structure and the semantic structure of the retrieval type of the natural language of user's input are converted into concept map by analysis, and the immediate concept map of retrieval is shown to the user in database then.
Like this, the applied ontology opinion is hinting and is setting up concept relation graph in information retrieval, and compares the concept map of retrieval type and database.
The initiative information that U.S. Pat 6498795B1 describes is found and searching system, use the network architecture of an active and information layeredly come structuring semantic and carry out the information binding automatically, and provide the information framework of symmetry to come filtering information and in network, bind based on ontological.Retrieval type is routed directly to relevant information source, and information is distributed to favourable destination.
Above-mentioned patent hint on each node of Active Networks, generate a content ontology apart from tree and retrieval type ontology apart from tree.Use the network architecture initiatively and based on ontological information layered respectively as network and semantic frame.System uses the ontology expansion (SHOE) of simple HTML(Hypertext Markup Language).When a SHOE distance made one to require based on a special ontology especially, it was not the knowledge of directly explaining that software can automatically be released by specific ontology.Ontology provides to the expansion that implies knowledge.The SHOE mark allows the new ontology based on expansion of definition.The operation model of retrieval is applied on arbitrary part of ontology apart from the tree layer.Decide the child node possibility relevant in the ontology through calculating special coefficient with father node.
Therefore, above-mentioned patent is used the semantic structure retrieving information, and this is hinting to information source and adding ontology mark (using SHOE automatically or manually), then just may be according to the SHOE mark based on ontology relation retrieve information.
In U.S. Pat 2002/0116169A1, described that character string is generated the method and apparatus that normalization is represented.Use ontology, dictionary and terminological data bank as normalized device at this.
Above-mentioned patent attempts to increase the retrieval characteristic of information retrieval system, particularly uses ontology to come the character string of normalization ground expression retrieval type and database.
U.S. Pat 2003/0177112 has been described a kind of based on ontological information management system and method, between structurized data source and non-structured data source, use ontology that semantic matches is provided, and comprise generation, requirement that process such as rationalize, examine, unite satisfies life science, informationization and other principle.This patent suggestion use ontology effectively activates morphology and the semantic matches between the clauses and subclauses.
Above-mentioned patent is used the information retrieval engine that can sort out big collection of document, estimates the space length between a pair of information.
The described method of above-mentioned patent is mainly derived from based on ontological information source management, and this helps integrated morphologyization and non-structured data.Information source is to generate new ontological source.The information retrieval engine is based on to classification of Data.
Ontology also is used for the expansion of retrieval type.In U.S. Pat 5822731, the use semantic network maximizes the quantity of relevant documentation in retrieval, each term in the retrieval type is correlated with expands retrieval type.
In U.S. Pat 2001/0003183A1, a kind of method and apparatus of retrieving has been described in knowledge base.Ontology is an integration section of this system.Retrieve template base and represent with computer system with the dictionary of the keyword that very abstract concept is relevant.Each template comprises one or more categorical variable.Just can generate a retrieval type by one or more keyword of input in system.Each keyword is conceptualized as notion (using different dictionary and ontology).Select a notion through additional extraction, from several candidate's notions or extract continuously or refuse different keywords, further refine each notion up to finding an acceptable notion.Then, for resulting notion, find all retrieval templates by using these notions or keyword.The user selects the most appropriate retrieval type then.Use the described system of this patent and can form the retrieval type that inserts any source set.The described system of this patent is specially adapted to which distributed access, inhomogeneous Database Systems, the word or the structure of these system's neither one single standardizations.
In fact, above-mentioned back three kinds of method representations use the keyword retrieval expansion of the ontology device of different variablees.
Jap.P. JP2000222436 has described a kind of information retrieval method and equipment that uses ontology to support.Described method can be selected the choice menus of different databases for the retrieving information of user's input.The described ontology of this patent is got up the information source management of database as tree structure, and from the high level to the low layer, the database choice menus also is abstracted into the abstract of low degree from high level intelligently.Briefly, this patent suggestion uses ontology to reflect the content of database, helps the user to refine or notion of universalization.
The retrieval type that U.S. Pat 20020147578A1 has described a kind of retrieving information forms system.This system uses semanteme or morphological information to eliminate one or more irrelevant word, forms a retrieval type again, and retrieves in information bank based on the retrieval type that forms again.A lot of relevant dictionary, dictionary and ontologys have been used in the processing of each problem.
At this, use ontology to eliminate irrelevant speech and form a retrieval type again.
The information retrieval system that U.S. Pat 6363378B1 describes uses ontology that the retrieval feedback is sorted.Described information retrieval system is handled retrieval type, and the theme of discern theme relevant with retrieval type and retrieving feedback result links the interdependent node of these themes in the knowledge base then.At least from knowledge base, select a host node based on theme, judge the notion degree of approximation between host node and the retrieval feedback node then.Use the hierarchical relational in the ontology to come the computing semantic degree of approximation at this.According to the notion degree of approximation result for retrieval is sorted at last.
The theme that above-mentioned patent uses ontology to carry out in knowledge base and the retrieval type is discerned, the calculating of the lang justice degree of approximation of going forward side by side.
Therefore, the idea of using ontology to improve the information retrieval system characteristic is not new; It is disclosed with various forms in various patents.For example, the retrieval of normalization, different types of data of the theme that uses the retrieving structured and unstructured database of diverse ways, identification document or subject term, character string semantic expressiveness and expansion of integrated, retrieval type or the like are disclosed.To with regard to the retrieval type expansion, in general use the ontology expansion with regard to present use ontology, and use ontological hierarchical relational in the specific knowledge field based on keyword with based on the retrieval of notion.
Summary of the invention
One embodiment of the present of invention comprise a kind of system, method or computer programmed product, and they can be based on the semantic method of data/representation of knowledge and processing, solve but are not limited to inventive problem or user's problem, the implementation language processing module.In one embodiment, the basic element of character of this module may comprise a language knowledge base (KB), an ontology knowledge base and/or an expert knowledge library.
According to one embodiment of present invention, the language analysis of a user search formula can be provided described language knowledge base and its formal semantic expressiveness---Verb (verb)-Parameter (parameter)-Object (object) (VPO), be also referred to as " technical functionality ", this also is a kind of formal characteristic of problem.
The ontology knowledge base may comprise certain knowledge of world around, represents with many words (notion and verb) of different kens and the semantic relation between these words, for example: synonymy, race relation and incidence relation.
Language processing module can be finished semantic extension under the help of ontology knowledge base.Language processing module provides the fetching and accuracy of maximum of information retrieval, can find the solution of solution and analogy to the retrieval of a particular problem, and this is very important when handling above-mentioned this generic task.In addition, the user also may change the degree of semantic extension based on the degree of approximation of word in the ontology knowledge base.
Expert knowledge library among embodiment is the solution knowledge base for the technical solution problem, derives from many text documents, is mainly derived from patent and paper.These solutions are expressed as SVPO (subject term-verb-parameter-object) form, are subject terms at this S, or perhaps the solution of the defined technical functionality of VPO.By retrieval type after the comparison semantic extension and the solution in the expert knowledge library, language processing module can be determined the solution (comprising the analogy solution) of specific retrieval type.In one embodiment, the output of language processing module is these solutions of arranging according to semanteme.The result is that the user can obtain accurate (precise) scheme, special case (particular) scheme, general (general) scheme and analogy (analogous) scheme of retrieval type.
According to one embodiment of present invention, language processing module can provide a kind of effective solution for user's retrieval type, has realized the processing of information/knowledge by language, ontology, expert knowledge library and a series of semantic method edit tool.
Ontological use has improved the characteristic of information retrieval system greatly, these system handles main information mounting mediums---document:
The semantic analysis correct to the user search formula can be provided;
Can use each important speech in the ontology expansion retrieval type; And/or
Ontology can reflect ken notion and the relation that above-mentioned required by task is wanted of finishing.
Therefore, our method distinguishing characteristics is:
1 one kinds of new methods based on Language Processing document (mainly being patent) are with solving but be not limited to inventive problem and user's technical matters;
2 according to the 1st point, and language processing module provides:
A) the formal problem of representing with the VPO form;
B) based on ontological semantic automatic expansion to formal problem representation;
C) the automatic semantic indexing of patent collection;
D) according to above-mentioned (2) (a) and (2) (b) retrieve exact solution automatically; And/or
E) the automatically more general problem of retrieval, the exact solution of problem and analogy problem more specifically.
3 described ontology technology are general, because:
A) ontology can be applied to any ken;
B) ontology has reflected the semantic relation of main semantic classification, comprises notion, their attribute and action;
C) the main semantic relation between the semantic classification of ontology reflection comprises:
Main relation on attributes;
Synonymy;
Hierarchical relational; With
Incidence relation;
D) ontology has the mechanism that concerns the degree of depth between the management semantic classification; And/or
E) ontology is open to the user, promptly can provide the technology that ontology is edited.
Therefore in fact embodiments of the invention can provide effective support to inventor's professional activity, and can help typical user to deal with problems.
The more feature and advantage of the present invention, and the structure of various embodiments of the invention and operation are described in detail hereinafter in conjunction with the accompanying drawings.
Description of drawings
Above-mentioned and further feature, advantage of the present invention will become very obvious by the description of accompanying drawing illustrated embodiment.In the accompanying drawing, the element of identical numbering is represented identical, intimate or structural similarity.
Fig. 1 represents according to one embodiment of present invention, is used to solve the 26S Proteasome Structure and Function block diagram of the language processing module of inventive problem and user's technical matters;
Fig. 2 represents according to one embodiment of present invention, an example fragment of the ontology knowledge base of notion; With
Fig. 3 represents that according to one embodiment of present invention language processing module is carried out the 26S Proteasome Structure and Function block diagram of expanded function.
Embodiment
The language processing module general introduction of inventive problem and user's technical matters
One embodiment of the present of invention provide a kind of method of dealing with problems.In one embodiment, language processing module (LPM), and the knowledge base (KB) of the multicompartment of natural language between the entity in the interested certain field of embodiment and relation, provide the structuring of user input and the high-quality understanding of destructuring retrieval type, and provide discovery the most accurate with the search technique of relevant solution the most completely.
Fig. 1 is according to one embodiment of present invention, solves the 26S Proteasome Structure and Function block diagram of the language processing module of user's technical matters and inventive problem.As shown in the figure, the LPM among Fig. 1 receives a user search formula 104.Use language knowledge base 132, LPM process user retrieval type (108), the formal expression of generation retrieval type.Then, use 136 pairs of given retrieval types of ontology knowledge base of LPM to carry out semantic expand (116).The multiple search modes that obtains is input in the retrieval module 120 of LPM.Use an expert knowledge library 140, LPM determines to solve all available solutions of correspondence problem, according to the degree of correlation these solutions is sorted (124), and enumerates all solutions (128).
In one embodiment, language knowledge base 132 can comprise, but is not limited to, the rule of analysis, lemmatization dictionary, the classification of logic of language and noun phrase.
Knowledge base 136 can be a word hierarchical data base of different kens." word " this vocabulary shows a notion (word-notion) and a verb (word-verb) as used herein.Before the structure and content of describing the ontology database, need carry out following definitions:
Synonym---two speech of expression identical meanings or the semantic relation between two morphology structures in given context.
For example: " alter ", " change ", " modify ", " Vary " etc.
Directly synonym-with speech or morphology structure context-free, that have identical (close) implication.
For example, " water ", " aqua " etc.
The different morphology structure of sentence structure synonym-expression identical (perhaps close) implication.
For example, " to heat ", " to increase temperature ".
Semantic relation between two speech of the parent notion/subclass notion of the given one group of notion of race relation (parent relation/subclass relation)-be or two the morphology structures.
For example, " oxygen "-" gas ", " increase "-" change ", " temperature "-" parameter ".
Semantic relation between two speech of the relevant relation of incidence relation-each other or two the morphology structures.They are called as " brother/sister " relation, and they have identical " parent relation ", their all be same " subclass notions " of " parent notion ".
The formal expression of verb-parameter-object (VPO)-problem.Verb will be enhanced in technical functionality of this expression; A kind of special characteristics of parameter (also may not have parameter, only be called VO in this case) presentation technology system or one of them element; A technological system or one of them element of technical functionality that object representation relates to or process.
For example:
Is problem: How to increase temperature of water (how increasing the temperature of water)?
VPO:V(increase)P(temperature)O(water)。
Subject word (Subject-S)---be the solution of the problem of VPO organization definition.
For example:
Fire?increase?temperature?of?water
SVPO:S(Fire)V(increase)P(temperature)O(water)
Main speech---the specific word of the whole noun phrase morphology characteristic of definition in the noun phrase.
For example:
Noun phrase: cold water;
Main speech: water.
Lemmatization---from the morphological change of a speech, generate original morphology.The original morphology of verb is indefinite verb, and the original morphology of noun is a singulative.
For example:
Verb: " moving "-" move ";
Noun: " cars "-" car ".
Synonym set---one group of set of synonym word (verb or noun).
For example:
Synonym set: " marine vessel ", " vessel ", " watercraft ".
The synonym expansion---a speech is expanded to the function of synonym morphology element set.
For example:
" vessel " expanded to " marine vessel " " watercraft ";
" heat " expanded to " increase temperature ".
The kind expansion---a speech is expanded to the function of the morphology elements combination of more general implication (upper implication) or more special implication (the next implication).
For example:
" marine vessel " expanded to " craft " (more general implication);
" marine vessel " expanded to " craft ", " ice yacht ", " scooter " or the like (more specifically implication).
Related expansion---a speech is expanded to the function of the morphology elements combination of close implication.
For example:
" regularization " is relevant with " regulation ", " quality control ", " restraint " or the like.
Word in the ontology divides into groups according to following relation:
1) " synonym " relation comprises:
1a) " directly synonym " relation;
1b) " sentence structure synonym " relation;
2) " kind " relation (parent relation/subclass relation):
3) relation of " being correlated with ".
Wherein, (1a), (2), (3) are the characteristics of noun phrase; (1a), (1b), (2) are the characteristics of verb phrase.
Relation (1b) is represented following synonymy:
Verb 1 (Verb1)=verb 2 (Verb2)+parameter (parameter).
For example:
moisten=augment?humidity;
heat=increase?temperature。
In order to enrich ontology database 136, developed a kind of special computer based instrument according to the present invention, it can carry out the analysis of domain knowledge expert documents and materials automatically.
Fig. 2 has described one section tree-like Figure 200 of intercepting from ontology conceptual knowledge base 136.Except node 244,252,260, between a plurality of phrases in each node in tree-like 200 (1a) relation; Line segment among tree-like Figure 200 is represented type (2) relation.Like this, any one speech all is the parent relation of any one speech in the node 244 in the node 248.At last, phrase has the relation of type (3) between a node on the same level and another node.For example, between unique phrase on the node 252 and the arbitrary phrase on the node 256/260/264 be exactly relation (3).
Expert knowledge library 140 among Fig. 1 is the technical solution knowledge bases that extract from natural language documents (patent, paper or the like).Expert knowledge library 140 can only searched module use.The title of each solution all is the sentence expression with a natural language, comprises 4 fields, corresponding the key concept of SVPO (Subject/Verb/Parameter/Object).It should be noted that the S field can not directly retrieve by searched engine, just as a solution of problem (problem is defined by VPO).
For the VPO field that correctly generates retrieval need satisfy a lot of requirements:
1 each field must be expressed as typical form:
The noun that ■ parameter and object are comprised must be the form of nominative, odd number, for example: " nanotube arrays "-" nanotube array ":
If ■ parameter and object comprise of phrase, they must be converted into informal structure, for example: " queryof user "-" user query ";
The verb of ■ verb field must be infinitival form, for example: " provided "-" provide ".
2 parameters or object field if link together by connecting speech, are divided into several sections according to the position that connects speech with them, have so just formed 2 or more parameter or object, for example:
“polymers?and?copolymers”->“polymer”,“copolymer”
3 parameters and/or object field comprise simple noun phrase, peel off all additional informations:
Bowl containing water "-object: " bowl "
The embodiment of a technical solution is as follows:
The natural language form:
Accelerometer?detects?acceleration?of?magnetic?head.
The SVPO form:
S:accelerometer
V:detects
P:acceleration
O:magnetic?head
When working, the semantic processes module among Fig. 1 used above-mentioned knowledge base (semantic knowledge-base 132, ontology knowledge base 136 and expert knowledge library 140).The user search formula offers semantic processes module LPM.At first LPM checks the rationality of user search formula.The VPO field is enforceable to the retrieval type structure.Studies show that, most inventive problem can be expressed as the form of a kind of being called " technical functionality ", VPO form just, the formal characteristic that it has represented a problem (for example: " diskincreases the depth of grinding ", at this V is " increases ", P is " depth ", and O is " grinding "), however the solution of problem is exactly the executor of this technical functionality.Structurized function retrieval type (VPO) needn't pass through semantic processes.Non-structured function retrieval type will be through semantic processes to obtain the VPO field.LPM has used some rules to carry out semantic processes (108), and they are used to describe an analytical model up and down.Processing Algorithm does not need a lot of resources when realizing.Because following restriction, it is very effective when non-structured user search formula is handled:
Lack subject (S); With
More complicated sentence structure is because all input sentences have all adopted imperative mood.
Be the semantic processes embodiment of non-structure retrieval formula below:
Example 1:
Retrieval type: How to test fatigued metals?
Structured form: V (test) O (fatigued metal)
Example 2:
Retrieval type: How to measure mechanical properties of MEMS material?
Structured form: V (measure) P (mechanical property) O (MEMS material)
It should be noted that LPM not only uses semantic knowledge-base (132) when handling, also may use ontology knowledge base (136), it provides the term of noun and verb phrase, has finally promoted the performance of handling.A user search formula by analysis can be a formal VPO structure.These fields must be passed through lemmatization, shown in expert knowledge library 140.
The user search formula of VPO form can further be submitted to retrieval enlargement module (116), and it uses the hierarchy of ontology knowledge base 136 to finish semantic term expansion.This process can be used in the back, in order to retrieve the solution relevant with problem as much as possible when using expert knowledge library 140.
Fig. 3 has described one embodiment of the present of invention, and the 26S Proteasome Structure and Function synoptic diagram of the expansion module of LPM has been described.In the block diagram 300, the user search formula 368 of VPO form uses any variable extended method to expand.Correspondingly to carry out any following expansion (as shown in Figure 3):
Synonym expansion 372 (verb, parameter and object are expanded);
Kind expansion 376 (go up the next expansion, only object is expanded); And/or
Related expansion 380 (only to object extension).
Synonym was expanded 372 o'clock, and each field of user search formula (VPO) all can be substituted by corresponding synonym: direct synonym and morphology synonym.
For example:
Input (user search formula) is: change dimensions of a solid body
VPO form: V (change), P (dimension), O (body)
Output (synonym expansion):
V(change,alter,modify,vary)
P(dimension,proportion,size)
O(body,organic?structure,physical?structure)
The result who should be noted that morphology synonym expansion (V-〉VP or VP-〉V) also can obtain the synonym term.
Kind expansion (376) is that the hierarchical relational (father relation or subrelation) of the term in the retrieval type field with term substituted.Two kinds " kind expansions " are arranged:
■ is from special case term to a generic term (bottom-up)
For example:
Input (user search formula): change the surface curvature of the conducting liquiddrop
VPO form: V (change) P (surface curvature) O (conducting liquid drop) output: (only object O is carried out the father concerns expansion)
O(round?shape,small?indefinite?amount)
■ is from a generic term to a special case term (top-down)
For example:
Input (user search formula): change the direction of movement of the gas flowVPO form: V (change) P (direction) O (movement)
Output: (only object O being carried out the subrelation expansion)
O(abduction,adduction,flit,dart,circumduction,inclination,retraction,retrofection,rotation,vibration,…)
Race relation 376 can be retrieved special case more, more general with relevant solution.
Related expansion (380) is that the incidence relation of the term in the retrieval type field with term substituted.
For example:
For example:
Input (user search formula): measure traveling distance
VPO form: V (measure) O (traveling distance)
Output: (only object O being carried out the association expansion)
O(light?time,skip?distance,wingspan,wingspread,object?distance,migrationdistance,migration?length,altitude,……)
The expansion of related kind 380 allows to search at retrieval type the problem (analogy scheme) of analogy.User search formula after the expansion as shown in the figure 384. like this
Target to solution module (120) search is to search solution according to the retrieval type (116) after the expansion in expert knowledge library (140), and the VPO field of enumerating in the solution 128. search engine comparison expert knowledge libraries 140 according to the result who searches will retrieve relevant solution with corresponding relation from 372/376/384. these fields of the retrieval type after the expansion of retrieval type expansion module 116/300.
In expert knowledge library 140 with expansion after retrieval type certain related solution is arranged, will be extracted out as result for retrieval and offer the user.Because the character of these solutions, need be according to semantic relation to they classify (according to expansion type 372/376/380).All solutions in expert knowledge library are divided into following a few class according to semantic relation:
The VO/VPO field of 1 accurate scheme---these solutions and compound fully to the initial VO/VPO field that forms of retrieval type.
For example:
User search formula: V (heat) O (water)
Solution: S (coil) V (increase) P (temperature) O (water)
In the VO/VPO field of 2 special case schemes---these solutions at least one is a special case of relevant field in the retrieval type.
For example:
User search formula: V (neutralize) O (hydrochloric acid)
Solution: S (alkali) V (neutralize) O (hydrochloric acid)
In the VO/VPO field of 3 general schemes---these solutions at least one is a generic concept of relevant field in the retrieval type.
For example:
User search formula: V (neutralize) O (hydrochloric acid)
Solution: S (alkali) V (neutralize) O (acid)
In the VO/VPO field of 4 analogy schemes---these solutions at least one is a related notion of relevant field in the retrieval type.
For example:
User search formula: V (neutralize) O (hydrochloric acid)
Solution: S (alkali) V (neutralize) O (nitric acid)
In the above example, the solution thinking of S representative " descriptor " or problem.
Carry out classification algorithms shown in two following tables (being respectively VPO form and VO form) according to the type of solution.Earlier symbol is wherein done to explain:
S---original term or its synonym;
H---subrelation term;
R---father's relational terms;
C---related term;
The accurate coupling of Exact---term;
Partial---part coupling (according to the reduction algorithm of left speech);
Any---accurately coupling or part are mated;
∈---the meaning of " belong to, be contained in ".
Table 1
The solution type Verb (V) Parameter (P) Object (O) Other condition
Accurate scheme S-Exact? S-Exact? S-Exact?
The special case scheme S-Exact? S-Exact? SH-Exact? O∈H-Exact?
General scheme 1 S-Exact? S-Exact? SHR-Exact? O∈R-Exact?
General scheme 2 S-Exact? S-Any? SHR-Any? P∈S-Any&? O∈SHR-Partial?
The analogy scheme S-Exact? S-Any? SHRC-Any? O∈C-Any?
Table 2
The solution type Verb (V) Parameter (P) Object (O) Other condition
Accurate scheme S-Exact S-Exact
The special case scheme S-Exact SH-Exact O∈H-Exact
General scheme 1 S-Exact SHR-Exact O∈R-Exact
General scheme 2 S-Exact SHR-Any O∈SHR-Partial
The analogy scheme S-Exact SHRC-Any O∈C-Any
For example, in the analogy scheme of table 1, verb one hurdle is " S-Exact ", this means that the verb field can only carry out synonym (S) expansion to the verb (Exact) of input.Parameter field is also carried out the synonym expansion.Aiming field can comprise any field (SHRC-Any) of carrying out semantic extension.Other condition is meant that object field must comprise a relational language (C).
It is as shown in the table, and general scheme has two row.General scheme 1 is meant original term is carried out the solution that semantic extension obtains without intercepting.General scheme 2 is meant the solution that obtains behind the most left speech intercepting of the use algorithm.
The most left speech intercepting algorithm is as follows.If in the ontology knowledge base, do not find accurate coupling, just delete the word on the left side, then the remaining term of retrieval once more in the ontology knowledge base to input term.This process constantly repeats up to having found a coupling, till the perhaps only surplus next word.In either case, compare with original term, generally think that all their notion is more general through the term that intercepting obtains.
For example: " photosensitive resin composition " becomes later " resin composition " through intercepting, once more through becoming " composition " after the intercepting.
In one embodiment of the invention, can use a computing machine and/or communication system.In one embodiment, a computing machine can be used as several computing equipments, such as but not limited to, knowledge base of the present invention.Computing machine includes, but not limited to people's computing machine one by one, a workstation, a mobile device, a phone, personal digital assistant one by one, an easy client, the client of a complexity, a network application, an internet browser, a pager, a panalarm, a TV, an interactive television, a receiver, a tuner, a high definition television, a high definition television receiver, a video on-demand system, server, perhaps a miscellaneous equipment or the like.
In one embodiment of the present of invention, computing machine comprises a central processing unit, and it is linked to bus.Processor can be by total line traffic control memory.Computing machine can link to each other with IOS, and Network Interface Unit for example perhaps is linked to the modulator-demodular unit of network.Computing machine also can link to each other with second memory by bus, perhaps links to each other with second memory by main memory.Second memory can comprise a disk memory cell or other storage medium or the like.Disk memory cell among the embodiment includes, but not limited to a magnetic storage device, hard disk for example, and perhaps optical storage equipment, for example write-once is repeatedly read the driver of (WORM), a CD (CD), perhaps a magneto-optic device.The another kind of type of second memory comprises a movably disk storage device, it and movably storage medium unite use, for example a CD-ROM, perhaps floppy disk or the like.In general, the disk memory cell can store an application program of operational computations machine system.The disk memory cell also can the stored data library file.Computing machine can interrelate by bus and IOS and disk memory cell.Bus also can be linked on the output display unit, on the input equipment, for example, but is not limited to, on a keyboard and a mouse or other appointment or the selection equipment.
In this patent, computer program medium and computer-readable medium generally be meant, but be not limited to, and storage device medium movably is installed in hard disk media in the hard disk drive and signal media or the like.These computer programs can provide software to computer system.Protection scope of the present invention also comprises these computer programs.
A kind of special circumstances when embodiment of the present invention is the invention process, protection scope of the present invention is not limited thereto.
Connect with connecting and refer to all that in the present invention two or more unit directly link to each other, can refer to that two or more unit are not directly to link to each other, and cooperate with each other between them or reciprocation but connect.
The algorithm here is meant uniform a series of actions or the operation of finding institute's requirement result.They can comprise a large amount of physical operationss.The signal form of operation can be electric signal or magnetic signal, and these signals can be by storage, conversion, associating, relatively and the operation of other form.The form of these signals can be a bit, numerical value, and element, symbol, character, term, numeral, or the like.
Processing described in the present invention, calculating, judgement or the like all are to a kind of operation of data and conversion.
Computer platform can comprise one or more processor.
Embodiments of the invention comprise finishes these apparatus operating.
Although described some embodiments of the present invention above, it should be understood that these embodiment are object lessons more of the invention process, should not the restriction of protection domain of the present invention.Protection scope of the present invention should not limited by the description of instructions, and should be limited by claims and their equivalent.Those skilled in the art are according to above-mentioned description and the change that the embodiment of the invention is done is described, all should be within protection scope of the present invention.

Claims (26)

1 one kinds of methods of dealing with problems comprise:
Store a user search formula, described retrieval type comprises at least one structuring or non-structured problem description;
Described non-structured user search formula is resolved, form a structurized user search formula, described structurized user search formula comprises a formal semantic expressiveness formula of described user search formula;
Described formal semantic expressiveness formula to described user search formula is expanded, and the retrieval type after at least one relevant with problem expanded is provided;
Use the retrieval type after the described expansion in expert knowledge library, to retrieve; With
The solution that will find in expert knowledge library is classified according to semantic relation.
2 methods according to claim 1, wherein said structurized user search formula are to represent with verb-parameter-object (VPO) form semantically.
3 methods according to claim 1, wherein said problem comprise an inventive problem or user's technical matters at least.
4 methods according to claim 1, wherein said non-structured user search formula is to represent with the form of natural language.
5 methods according to claim 1, wherein said semantic extension comprise at least one problem expression formula that search is relevant with problem.
6 methods according to claim 1, wherein the step of Xie Xiing comprises:
Described destructuring user search formula is carried out morphology and grammatical analysis; With
Described non-structured retrieval type is carried out semantic analysis, generate the formal semantic expressiveness formula of described user search formula.
7 methods according to claim 6, wherein said morphology and grammatical analysis are discerned the noun and the verb word of described user search formula based at least one semantic knowledge-base or an ontology knowledge base.
8 methods according to claim 6, wherein said morphology and grammatical analysis are based on the semantic algorithm that described user search formula is carried out morphology and grammatical analysis, lemmatization dictionary, noun and verb are classified and come from the semantic knowledge-base the tabulation of parameter, generate the formal semantic expressiveness formula of the verb-parameter-Format Object of described user search formula.
9 methods according to claim 1, wherein said semantic extension is carried out based on the ontology knowledge base, generates the semantic extension formula of the described formal expression of described user search formula.
10 methods according to claim 9, wherein said ontology knowledge base comprises a database, contains a term in the described database at least, a kind of relation of perhaps described at least one term between two or more different kens.
11 methods according to claim 10, wherein said at least one term comprises at least one word concept, perhaps a verb notion.
12 methods according to claim 10, the relation of wherein said verb term comprise at least one direct synonym, perhaps a sentence structure synonym.
13 methods according to claim 10, the relation of wherein said vocabulary of terms comprise at least one direct synonym, perhaps a race relation, perhaps an incidence relation.
14 methods according to claim 9, wherein said ontology knowledge base comprises at least one term or instrument, makes the different field expert can edit or enrich described ontology knowledge base.
15 methods according to claim 9, wherein said semantic extension comprise the enumerating of search modes of the described formal semantic expressiveness formula of described user search formula.
16 methods according to claim 15, each search modes in wherein said the enumerating comprise the correlation that this search modes is compared with described formal semantic expressiveness formula.
17 methods according to claim 1, wherein said retrieval comprise based on the described search modes in described the enumerating to be retrieved in described expert knowledge library, determines all relevant solutions of described problem.
18 methods according to claim 17, wherein said expert knowledge library are the knowledge bases of the technical solution that extracts from the document that natural language is represented.
19 methods according to claim 18, the document that wherein said natural language is represented comprise at least one patent or one piece of paper.
20 methods according to claim 18, wherein said technical solution are the sentences of natural language, and represent with the form of theme-verb-parameter-object.
21 methods according to claim 17, wherein said expert knowledge library can provide term and instrument, makes the different field expert can edit and enrich described expert knowledge library.
22 methods according to claim 17 wherein saidly determine that all relevant solutions only are meant the technical solution that verb, parameter, the object field of respective field and search modes is complementary in described expert knowledge library.
23 methods according to claim 22, wherein said technical solution are endowed and the similar correlation of described search modes correlation.
24 methods according to claim 1 are wherein saidly classified based on the described correlation of the type and the described technical solution of described semantic expressiveness formula to solution according to semanteme.
25 methods according to claim 24, the classification of wherein said solution comprises following several: accurate scheme, special case scheme, general scheme and analogy scheme.
26 methods according to claim 25 wherein in the son of described solution classification is enumerated, sort according to the type of semantic expressiveness formula and the described correlation of described technical solution.
CNB2004100783370A 2004-09-24 2004-09-24 Method of solving problem using wikipedia and user inquiry treatment technology Expired - Fee Related CN100361126C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100783370A CN100361126C (en) 2004-09-24 2004-09-24 Method of solving problem using wikipedia and user inquiry treatment technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100783370A CN100361126C (en) 2004-09-24 2004-09-24 Method of solving problem using wikipedia and user inquiry treatment technology

Publications (2)

Publication Number Publication Date
CN1752966A true CN1752966A (en) 2006-03-29
CN100361126C CN100361126C (en) 2008-01-09

Family

ID=36679820

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100783370A Expired - Fee Related CN100361126C (en) 2004-09-24 2004-09-24 Method of solving problem using wikipedia and user inquiry treatment technology

Country Status (1)

Country Link
CN (1) CN100361126C (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008131607A1 (en) * 2007-04-28 2008-11-06 Iatopia Group Limited A system and method for intelligent ontology based knowledge search engine
CN102160079A (en) * 2008-09-19 2011-08-17 摩托罗拉移动公司 Selection of associated content for content items
CN102662929A (en) * 2012-03-20 2012-09-12 中南大学 Method and device for intelligent problem analyzing and processing based on ontology
CN103412866A (en) * 2013-06-14 2013-11-27 杜向阳 High-intelligent search engine capable of conducting inspiration thinking and intuitive thinking
CN105468933A (en) * 2014-08-28 2016-04-06 深圳先进技术研究院 Biological data analysis method and system
CN105653660A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Association method and device of retrieval keyword
CN106980624A (en) * 2016-01-18 2017-07-25 阿里巴巴集团控股有限公司 A kind for the treatment of method and apparatus of text data
CN107016012A (en) * 2015-09-11 2017-08-04 谷歌公司 Handle the failure in processing natural language querying
CN110933952A (en) * 2017-03-30 2020-03-27 施耐德电气美国股份有限公司 Semantic search and rule method for distributed data system
CN114443425A (en) * 2022-01-10 2022-05-06 浪潮软件集团有限公司 Server operating system log diagnosis system and method based on Jieba weight calculation and feature scoring sorting algorithm

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6076051A (en) * 1997-03-07 2000-06-13 Microsoft Corporation Information retrieval utilizing semantic representation of text
US20020010574A1 (en) * 2000-04-20 2002-01-24 Valery Tsourikov Natural language processing and query driven information retrieval
CN1335574A (en) * 2001-09-05 2002-02-13 罗笑南 Intelligent semantic searching method
CN1521661A (en) * 2003-01-29 2004-08-18 黄致辉 Method for information retrieval by using natural language processing function

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008131607A1 (en) * 2007-04-28 2008-11-06 Iatopia Group Limited A system and method for intelligent ontology based knowledge search engine
CN102160079B (en) * 2008-09-19 2016-05-04 摩托罗拉移动公司 Be used for the selection of the content association of content item
CN102160079A (en) * 2008-09-19 2011-08-17 摩托罗拉移动公司 Selection of associated content for content items
CN102662929A (en) * 2012-03-20 2012-09-12 中南大学 Method and device for intelligent problem analyzing and processing based on ontology
CN103412866A (en) * 2013-06-14 2013-11-27 杜向阳 High-intelligent search engine capable of conducting inspiration thinking and intuitive thinking
CN105468933B (en) * 2014-08-28 2018-06-15 深圳先进技术研究院 biological data analysis method and system
CN105468933A (en) * 2014-08-28 2016-04-06 深圳先进技术研究院 Biological data analysis method and system
CN107016012A (en) * 2015-09-11 2017-08-04 谷歌公司 Handle the failure in processing natural language querying
CN105653660A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Association method and device of retrieval keyword
CN106980624A (en) * 2016-01-18 2017-07-25 阿里巴巴集团控股有限公司 A kind for the treatment of method and apparatus of text data
CN110933952A (en) * 2017-03-30 2020-03-27 施耐德电气美国股份有限公司 Semantic search and rule method for distributed data system
CN110933952B (en) * 2017-03-30 2024-02-27 施耐德电气美国股份有限公司 Method for semantic search and rules for distributed data systems
CN114443425A (en) * 2022-01-10 2022-05-06 浪潮软件集团有限公司 Server operating system log diagnosis system and method based on Jieba weight calculation and feature scoring sorting algorithm

Also Published As

Publication number Publication date
CN100361126C (en) 2008-01-09

Similar Documents

Publication Publication Date Title
US9280535B2 (en) Natural language querying with cascaded conditional random fields
CN1096038C (en) Method and equipment for file retrieval based on Bayesian network
CN1669029A (en) System and method for automatically discovering a hierarchy of concepts from a corpus of documents
US7272558B1 (en) Speech recognition training method for audio and video file indexing on a search engine
CN1535433A (en) Category based, extensible and interactive system for document retrieval
CN1871603A (en) System and method for processing a query
CN1777888A (en) Method for sentence structure analysis based on mobile configuration concept and method for natural language search using of it
CN1904896A (en) Structured document processing apparatus, search apparatus, structured document system and method
CN1281191A (en) Information retrieval method and information retrieval device
CN1677388A (en) Statistical language model for logical forms
CN1728140A (en) Phrase-based indexing in an information retrieval system
JP2005182280A (en) Information retrieval system, retrieval result processing system, information retrieval method, and program
CN1728142A (en) Phrase identification in an information retrieval system
CN1728143A (en) Phrase-based generation of document description
CN1536483A (en) Method for extracting and processing network information and its system
CN1841367A (en) Communication support apparatus and method for supporting communication by performing translation between languages
CN1894688A (en) Translation determination system, method, and program
CN1728141A (en) Phrase-based searching in an information retrieval system
CN1219266C (en) Method for realizing multi-path dialogue for man-machine Chinese colloguial conversational system
CN1691007A (en) Method, system or memory storing a computer program for document processing
CN1726508A (en) Learning/thinking machine and learning/thinking method based on structured knowledge, computer system, and information generation method
CN101034414A (en) Information processing device, method, and program
CN1808430A (en) Internet and computer information retrieval and mining with intelligent conceptual filtering, visualization and automation
CN1934569A (en) Search systems and methods with integration of user annotations
KR20070089449A (en) Method of classifying documents, computer readable record medium on which program for executing the method is recorded

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C57 Notification of unclear or unknown address
DD01 Delivery of document by public notice

Addressee: Liu Tonghao

Document name: Notice of first review

C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP02 Change in the address of a patent holder

Address after: 100025 Beijing city Chaoyang District eight Li Zhuang in 1 Lai Jin TownCN-08

Patentee after: Yiweixun Science and Technology Co., Ltd., Beijing

Address before: 100026 Beijing city Chaoyang District West Road No. 1 A Winterless center block 5A

Patentee before: Yiweixun Science and Technology Co., Ltd., Beijing

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080109

Termination date: 20200924

CF01 Termination of patent right due to non-payment of annual fee