CN107122421A - Information retrieval method and device - Google Patents
Information retrieval method and device Download PDFInfo
- Publication number
- CN107122421A CN107122421A CN201710217499.5A CN201710217499A CN107122421A CN 107122421 A CN107122421 A CN 107122421A CN 201710217499 A CN201710217499 A CN 201710217499A CN 107122421 A CN107122421 A CN 107122421A
- Authority
- CN
- China
- Prior art keywords
- mrow
- document
- similarity
- knowledge
- destination document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of information retrieval method and device, belong to internet arena, with the accuracy for the result for improving the user's problem to be solved retrieved.Methods described includes:Receive the problem to be solved of input;Determine the technical field belonging to the problem to be solved;According to the knowledge base in the technical field pre-established, determine the destination document matched in the technical field with the problem to be solved, wherein, the knowledge base includes the corresponding relation between corresponding relation and the Object of Knowledge and the document object between problem objects, Object of Knowledge, document object, described problem object and the Object of Knowledge, and the Object of Knowledge is selected from a part for described problem object;Return to the destination document.The application is used to answer problem to be solved.
Description
Technical field
The application is related to internet arena, more particularly to a kind of information retrieval method and device.
Background technology
With the high speed development of internet, user currently more and more tends to obtain by puing question on the internet
The answer of problem.Search engine can be carried out after the enquirement of user is got based on one or more keywords occurred in enquirement
Retrieval, and return to the result with one or more Keywords matchings.
However, for machine, a problem for understanding the mankind is a highly difficult thing, passes through above-mentioned this side
The result that formula is got is likely to not be that user puts question to the result for wanting to obtain, so as to cause retrieval rate relatively low.
The content of the invention
The embodiment of the present application provides a kind of information retrieval method and device, to improve the user's problem to be solved retrieved
Result accuracy.The technical scheme is as follows:
On the one hand there is provided a kind of information retrieval method, methods described includes:
Receive the problem to be solved of input;
Determine the technical field belonging to the problem to be solved;
According to the knowledge base in the technical field pre-established, determine in the technical field to be solved to ask with described
The destination document matched is inscribed, wherein, the knowledge base includes problem objects, Object of Knowledge, document object, described problem pair
As the corresponding relation between the corresponding relation between the Object of Knowledge and the Object of Knowledge and the document object, institute
State the part that Object of Knowledge is selected from described problem object;
Return to the destination document.
On the other hand there is provided a kind of information indexing device, described information retrieval device includes:
Interface module, the problem to be solved for receiving input;
Processing module, for determining the technical field belonging to the problem to be solved;
The processing module, is additionally operable to, according to the knowledge base in the technical field pre-established, determine the technology
The destination document matched in field with the problem to be solved, wherein, the knowledge base includes problem objects, knowledge pair
As, corresponding relation and the Object of Knowledge and the text between document object, described problem object and the Object of Knowledge
Corresponding relation between shelves object, the Object of Knowledge is selected from a part for described problem object;
The interface module, is additionally operable to return to the destination document.
The beneficial effect that the technical scheme that the embodiment of the present application is provided is brought includes:
When the problem to be solved (i.e. user puts question to) based on user is retrieved, consider not only one or more in problem
Individual keyword, while in view of the technical field of problem, by considering the technical field of problem to be solved and utilizing advance structure
The specific knowledge storehouse built, can greatly improve the accuracy of the result of the user retrieved problem to be solved.
Brief description of the drawings
Fig. 1 be the embodiment of the present application provide particular technology area in four layers of knowledge graph schematic diagram;
Fig. 2 be the embodiment of the present application provide it is a kind of exemplary the problem of node, knowledge node and file node relation
Figure;
Fig. 3 is the flow chart for the Exemplary Information-Retrieval method that the embodiment of the present application is provided;
Fig. 4 is a kind of schematic diagram for Exemplary Information-Retrieval method that the embodiment of the present application is provided;
Fig. 5 is the graph of a relation shown between node between the node of random walk probability that the embodiment of the present application is provided;
Fig. 6 is a kind of structured flowchart for Exemplary Information-Retrieval device that the embodiment of the present application is provided.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention
Formula is described in further detail." electronic equipment " said in text can include smart mobile phone, tablet personal computer, intelligent television,
E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image
Expert's compression standard audio aspect 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic shadow
As expert's compression standard audio aspect 4) player, pocket computer on knee and desktop computer etc.." the letter said in text
Breath retrieval device " can be one or more servers etc..
Related information retrieval method considers only the keyword occurred among a problem, is often difficult to understand for using
The intention at family.In order to understand a problem, the mankind usually using they technical field ABC.Such as problem
" when user attempts to send some special forms in outbox, program is just stuck in wait state ".First, we can pay close attention to
To " special form " and " outbox ", these are all product outlook some parts, and we are with regard to that can be inferred to this
Some problems that outlook is produced.
Being analyzed more than to be drawn, the background knowledge of technical field is played an important role among problem understanding.This
Machine is facilitated to understand customer problem by building the knowledge base of particular technology area in application.
Information retrieval method in the application is based on the knowledge base built in advance.The knowledge base includes problem pair
As, the corresponding relation between Object of Knowledge, document object, described problem object and the Object of Knowledge and the Object of Knowledge
Corresponding relation between the document object.Wherein, problem objects can be the problem to be solved one by one of user's input, know
Know the part that object may be selected from the problem to be solved, document object can be the document for solving problem to be solved one by one.
It is to show knowledge base in the form of knowledge graph in description below for ease of understanding the knowledge base mentioned in the application
In various pieces and its relation.
One technical problem is generally made up of three parts:Product, component and event word.As a rule, knowing in the application
Four parts can be included by knowing figure:Conceptual level, gas producing formation, component layer and event layers.Wherein:
Conceptual level:In conceptual level, a node represents a concept, and one group of a representation of concept has identity function
Product a, concept is generally also the sub- concept of another concept.
Gas producing formation:The attribute of all product and product is contained in gas producing formation.Gas producing formation is the core of whole knowledge graph
The heart, the node of gas producing formation illustrates the attribute of specific a product or product.The several properties of product, example can be pre-defined
Such as version, language and running environment.
Component layer:Generally, a technical problem is all some component on product, and component layer contains all productions
The component of product.
Event layers:After product or defined component, it is to be understood that the specific phenomenon of a problem, component layer bag
Some nouns of the problematic phenomenon containing description, verb, adjective etc..
The example of one knowledge graph can be partitioned into four layers by order from top to bottom in Fig. 1 with dotted line as indicated with 1:Concept
Layer, gas producing formation, component layer and event layers.
Knowledge graph is built using technology language material herein, specific construction method is described below.
Conceptual level and gas producing formation
Concept and product are extracted from product information herein.Such as 6052 products have always been obtained, have belonged to altogether for example
214 different classifications.Extract the attribute of product, such as " Office Pro using pre-defined rule herein simultaneously
Win32IT " represents the entitled Office of product, and version is Pro, and language is Italian (Italian), and is mounted in 32
Windows operating systems on.
Component layer
Herein using daily record the problem of technology language material and user come extraction assembly.First, the side of some sequence labellings is utilized
Method identifies the component mentioned among language material.These phrases extracted are represented as the node of component layer, herein using product
Weighed with the PMI value of component.PMI is a kind of common method for being used for weighing similarity between two phrases, if one
Individual component c and a product p more than one threshold value of PMI value, then it is considered that c is a p component.PMI definition is such as
Under:
Wherein
# (c) represents c occurrence number, and # (p) represents p occurrence number, and # (p, c) represents p and c co-occurrence number of times.
Event layers
Event layers have two kinds of different sides, are " event word (EventWordOf) " respectively and " are related to
(RelatedTo) ", we discuss both sides respectively.First, event word (EventWordOf) connects a product and one
Action word, we extract such relation using PMI using the method for similar assembly layer.As a rule, user is using dynamic
Word, adjective, adverbial word, noun etc. describe the phenomenon of a problem.Give large-scale technology language material, first with some into
The method of ripe location tags (POS-TAG), marks out the part of speech of technology language material.Simultaneously, it is assumed that if two technical problem energy
By same Resolving probiems, then they should be semantically closely similar, such as, document d can solve 3 technologies and ask
Topic is as follows respectively:
q2:Outlook 2007 motionless (Outlook 2007gets frozen).
q9:Outlook sends state and has been kept for a few hours (Outlook sending status remains for
hours)。
q15:Email is stuck in outbox (Emails get stuck in outbox).
So, we can draw motionless (frozen), keep (remain) and block (stuck) three words and compare semantically
It is more similar, so the relation of " being related to (RelatedTo) " between the corresponding timing node of these three words, can be connected.
In order to return Object of Knowledge and document object in the destination document being associated with user's problem to be solved, the application
It is associated.Wherein, document object can be obtained according to the technical problem daily record collected on network.
A kind of example annexation of problem objects, Object of Knowledge and document object can be as shown in Figure 2.Each saved in Fig. 2
Point can represent an object, such as one problem objects, an Object of Knowledge or a document object.To be solved in Fig. 2 is asked
Inscribing q is:Some specific self-defined forms are stuck in outbox (some specific Custom forms when user sends
get stuck in Outbox when users send it).Document d1 is the explanation that SP2 is set with to Microsoft Office
(Description of 2007Microsoft Office Suite SP2)。
In fig. 2, there is the connection side of three types:Trouble node is connected to the side of knowledge node, two knowledge of connection
Node while and knowledge node be connected to document node while.Wherein, for same trouble node, the trouble node connects
The side of each knowledge node is connected to, with identical weight.Conditional probability table can be used by connecting the weight on the side of two knowledge nodes
Show, that is to say, that the weight from node x to node y is expressed as the probability that y occurs in the case that x occurs, and is expressed as below:
Wherein, # (x, y) represents x and y co-occurrence number of times.
The weight on the side of document node is connected to for knowledge node to be represented with equation below:
Wherein, molecule, which represents all, to be solved by d and comprising number the problem of belonging to x, and denominator is all to be solved by d
The problem of quantity, QL (d) represents all the problem of can be solved by d.
After knowledge base is built in advance, you can the problem of being inputted according to user carries out information retrieval.
Reference picture 3, the embodiment of the present invention provides a kind of information retrieval method, and methods described includes:
Step 31, the problem to be solved of input is received.
Wherein, the problem to be solved of input can be the problem to be solved that user is inputted by electronic equipment.
Step 32, the technical field belonging to the problem to be solved is determined.
In the embodiment of the present application, problem institute to be solved can be determined by one or more keywords in problem to be solved
The technical field of category.
Step 33, according to the knowledge base in the technical field pre-established, determine in the technical field with it is described
The destination document that problem to be solved matches, wherein, the knowledge base includes problem objects, Object of Knowledge, document object, institute
State pair between the corresponding relation and the Object of Knowledge and the document object between problem objects and the Object of Knowledge
It should be related to, the Object of Knowledge is selected from a part for described problem object;
Step 34, the destination document is returned.
In this application, the destination document matched with the problem to be solved can be, for example, and wait to solve described in solution
Certainly the destination document of problem, the destination document comprising problem to be solved, include one or more keywords in problem to be solved
Destination document.
In this application, the destination document is returned to described in step 34 may include:Return to the title of the destination document
And/or the content in the return destination document.
The embodiment of the present application is considered not only and asked when the problem to be solved (i.e. user puts question to) based on user is retrieved
One or more keywords in topic, while in view of the technical field of problem, by consider the technical field of problem to be solved with
And using the specific knowledge storehouse built in advance, the accuracy of the result of the user retrieved problem to be solved can be greatly improved.
In the embodiment of the present application, determine with the problem to be solved to match in the technical field described in step 33
Destination document may include:
The problem objects according to the knowledge base, the Object of Knowledge and described problem object and the knowledge pair
Corresponding relation as between, the problem of determining similar with the problem to be solved in the technical field;
It is determined that it is each described similar the problem of and the problem to be solved between similarity score;
Based on the similarity score, and it is each described similar the problem of corresponding destination document, it is determined that being treated with described
Solve the problems, such as the destination document matched.
Here it is to be understood that the embodiment of the present application is corresponding the problem of being based on similarity score and be each described similar
Destination document, can directly select the corresponding destination document of similarity score highest Similar Problems as the problem phase to be solved
The destination document of matching.So, can be with most fast speed to user's returning result.This mode goes for user to speed
Degree requires high scene.
Certainly, in this application, can using it is each described similar the problem of corresponding destination document be used as candidate documents, institute
State based on the similarity score and it is each described similar the problem of corresponding destination document, it is determined that with the problem to be solved
The destination document matched may include:
Based on the similarity score, determine that the problem to be solved is similar to each in the candidate documents
Degree;
According to the sequential selection of similarity from high to low between the problem to be solved and the candidate documents one or more
Candidate documents are used as the destination document matched with the problem to be solved;
Wherein, the problem to be solved and the similarity of each in the candidate documents are determined as follows:
Q represents problem to be solved, and d represents a candidate documents, and score (q, d) represents problem q to be solved and candidate documents
Similarity between d, # (d, C) represents the total degree that d occurs in C, # (d, C0) represent d in C0The number of times of middle appearance, (q 'i,
d)∈C0Represent that d can be solved in C0Middle the problem of q 'i, score (q 'i, q) represent q 'iWith q similarity score;And C0Expression is asked
Topic daily record C subset, the problem of q ' represents similar with problem q to be solved, and
C0={ (q '0, d '0), { (q '1, d '1) ..., { q 'm, d 'm), q 'iRepresent i-th it is similar with q the problem of, m is represented
With sums of q the problem of similar, d ' represents destination document corresponding with q '.
The Similar Problems for being shown in Fig. 4, Fig. 4 and belonging to same technical field with the problem to be solved are can refer to, and
The corresponding destination document of each Similar Problems.The application with the problem to be solved it is determined that match in the technical field
During destination document, reference picture 4, if problem q000 is similar for the similarity score highest between problem to be solved
Problem, then can using the corresponding document d1 of problem q000 as problem to be solved destination document.It is of course also possible to by d5 and d1
(merely illustrative) as the destination document of problem to be solved, while before d1 is come into d5 in returning result.
Alternatively, it may be based on the problem to be solved similar to each in the candidate documents
Spend the ordering of the result to determine return.Correspondingly, after step 33 determines destination document, the embodiment of the present application is carried
The information retrieval method of confession may also include:Based on random walk (random walk) algorithm, the problem to be solved and institute are calculated
State the similarity of each document object in knowledge base;Based on each text in the problem to be solved and the knowledge base
The similarity of shelves object, reorders to the multiple destination document.
After being reordered to multiple destination documents, you can return to destination document according to the result after reordering.
Wherein, Random Walk Algorithm is based on described in the embodiment of the present application, the problem to be solved and the knowledge is calculated
The similarity of each document object in storehouse includes:Select one or more between the problem to be solved and the document object
Individual node sets index, wherein, the index of the node represents the node to the phase of each document object in the knowledge base
Like degree;It is based upon the index that one or more described nodes are set, calculates in the problem to be solved and the knowledge base
The similarity of each document object.
It is a kind of to select to set the mode of node of index be:Select the frequent node on path that index is set, wherein, frequently
Node is node of the product more than threshold value of in-degree and out-degree.
Random walk (random walk) algorithm is the method for weighing node similarity, generally, if saved from one
Point, according to the probability of each edge, is gone on another node at random as starting, and the probability for reaching another node is exactly initial section
The similarity of point and another node.The similarity that Random Walk Algorithm is calculated can be calculated by following manner:
Wherein, s (x, y) is the similarity between the node x based on random walk and node y, and N (x) represents all and x phases
The node of connection, and T (x, x ') represent to go to node x ' probability from node x.
In the application, the probability of transfer is used as using normalized weight.Similarity is being calculated based on Random Walk Algorithm
When, only retain and the side of document node is connected to from knowledge space node, the side for being connected to knowledge node from trouble node,
Make in a like fashion.
The similarity of customer problem node q and document node d based on random walk can be calculated by different modes.
A kind of mode is the method based on sampling.We are from customer problem node q, using side right weight as transition probability, random movement
To an adjacent node.Assuming that sampling number is N, rested on wherein having r times on document node d, q and d similarity are just
It is r/N, experiment shows, probably needs 4,000,000 samplings, node similarity can just tends to convergence, and this shows in online query
Taken very much using the method based on sampling, because in inquiry phase, it is necessary to system real-time response.Another way be based on
The definition of machine migration similarity, creates system of linear equations, and solve system of linear equations and obtain answer.Each two in reference picture 5, Fig. 5
Line between node represents the probability from a node migration to another adjacent node.It can be listed based on numerical value shown in Fig. 5
System of linear equations it is as follows:
However, the complexity for solving a system of linear equations is very high, the complexity of system of linear equations is solved by Gaussian elimination
Spend for O (n3), wherein n is the number of unknown number in equation group.In the knowledge graph built herein, number of nodes is very huge,
The complexity for solving a system of linear equations is very high, in order to improve calculating speed, can build index on some nodes in advance.It is right
In a node being indexed, the form of index is exactly a series of floating number, represents present node to the similar of all documents
Degree, such as be indexed, x index form is to node x:
Idex (x)={ s (x, d0), s (x, d1) ..., s (x, dm)}
Wherein, m is the number of document, it is assumed that index is built on node, the similarity with each document can be obtained.
For example, if in node v5、v8、v10On pre-establish index, s (v can be directly obtained5,d1)=0.701,
s(v8,d1)=0.668, s (v10,d1)=0.642, then system of linear equations above can be as follows by simplified result:
If from the example above as can be seen that building index, the quantity of the unknown number of equation on some nodes in advance
It will greatly reduce and (be reduced to 3 from 11).
In the embodiment of the present application, a greedy algorithm is proposed to select the node of materialization (being indexed).This greed
Algorithm selects some and frequently occurs on node on many paths every time because frequently node be easier to cover it is more
Path, is used as the measurement index of frequent node, that is to say, that in-degree × out-degree is bigger, and frequency is got over by the use of in-degree × out-degree herein
It is high.Frequency highest node is picked out every time in greedy algorithm, this node is added into index node, then recalculated other
The frequency of node, finally obtains all materialization nodes.
The mode for the calculating similarity based on index that the embodiment of the present application is provided, can greatly reduce amount of calculation, carry
Computationally efficient.Meanwhile, select to set the node indexed based on frequency, can be set and indexed with selected section node, without
Indexed with being set to all nodes, reduce further amount of calculation.
Fig. 6 is a kind of structured flowchart for information indexing device that the embodiment of the present application is provided, and reference picture 6, the application is implemented
The information indexing device 600 that example is provided includes:Interface module 601 and processing module 602.Wherein:
Interface module 601, the problem to be solved for receiving input;
Processing module 602, for determining the technical field belonging to the problem to be solved;
The processing module 602, is additionally operable to, according to the knowledge base in the technical field pre-established, determine the skill
The destination document matched in art field with the problem to be solved, wherein, the knowledge base includes problem objects, knowledge pair
As, corresponding relation and the Object of Knowledge and the text between document object, described problem object and the Object of Knowledge
Corresponding relation between shelves object, the Object of Knowledge is selected from a part for described problem object;
The interface module 601, is additionally operable to return to the destination document.
The information indexing device that the embodiment of the present application is provided, is carried out in the problem to be solved (i.e. user puts question to) based on user
During retrieval, one or more keywords in problem are considered not only, while in view of the technical field of problem, by considering to wait to solve
The certainly technical field of problem and using the specific knowledge storehouse that builds in advance, can greatly improve that the user retrieved is to be solved to be asked
The accuracy of the result of topic.
Alternatively, the destination document that the described and problem to be solved matches is the target for solving the problem to be solved
Document.
The interface module specifically for:Return to the title of the destination document and/or return in the destination document
Content.
Alternatively, the processing module 602 specifically for:
The problem objects according to the knowledge base, the Object of Knowledge and described problem object and the knowledge pair
Corresponding relation as between, the problem of determining similar with the problem to be solved in the technical field;
It is determined that it is each described similar the problem of and the problem to be solved between similarity score;
Based on the similarity score, and it is each described similar the problem of corresponding destination document, it is determined that being treated with described
Solve the problems, such as the destination document matched.
Alternatively, each described similar the problem of, corresponding destination document was as candidate documents, and the processing module 602 has
Body is used for:
Based on the similarity score, determine that the problem to be solved is similar to each in the candidate documents
Degree;
According to the sequential selection of similarity from high to low between the problem to be solved and the candidate documents one or more
Candidate documents are used as the destination document matched with the problem to be solved;
Wherein, the problem to be solved and the similarity of each in the candidate documents are determined as follows:
Q represents problem to be solved, and d represents a candidate documents, and score (q, d) represents problem q to be solved and candidate documents
Similarity between d, # (d, C) represents the total degree that d occurs in C, # (d, C0) represent d in C0The number of times of middle appearance, (q 'i,
d)∈C0Represent that d can be solved in C0Middle the problem of q 'i, score (q 'i, q) represent q 'iWith q similarity score;And C0Expression is asked
Topic daily record C subset, the problem of q ' represents similar with problem q to be solved, and
C0={ (q '0, d '0), { (q '1, d '1) ..., { q 'm, d 'm), q 'iRepresent i-th it is similar with q the problem of, m is represented
With sums of q the problem of similar, d ' represents destination document corresponding with q '.
Alternatively, it is determined that after destination document, the processing module 602 is additionally operable to:
Based on Random Walk Algorithm, the problem to be solved and the phase of each document object in the knowledge base are calculated
Like degree;
Based on the similarity of each document object in the problem to be solved and the knowledge base, to the multiple mesh
Mark document is reordered.
Alternatively, based on Random Walk Algorithm, the problem to be solved and each text in the knowledge base are calculated
Shelves object similarity when, the processing module 602 specifically for:
Select one or more nodes between the problem to be solved and the document object that index is set, wherein, it is described
The index of node represents the node to the similarity of each document object in the knowledge base;
It is based upon the index that one or more described nodes are set, calculates in the problem to be solved and the knowledge base
Each document object similarity.
Alternatively, selection set index node when, the processing module 602 specifically for:
Select the frequent node on path that index is set, wherein, frequent node is more than threshold value for the product of in-degree and out-degree
Node.
It should be noted that:The information indexing device that above-described embodiment is provided, only being partitioned into above-mentioned each functional module
Row is for example, in practical application, as needed can distribute above-mentioned functions by different functional module completions, Ji Jiangxin
The internal structure of breath retrieval device is divided into different functional modules, to complete all or part of function described above.Separately
Outside, the information indexing device that above-described embodiment is provided belongs to same design with information retrieval method embodiment, and it was implemented
Journey refers to embodiment of the method, repeats no more here.
Here also it is to be understood that interface module 601 and processing module 602 can be the different moulds in same physical equipment
Block, can be with depending on the application, and interface module 601 can be to be distributed in one or more physical equipments at diverse location, processing
Module 602 can also be to be distributed in one or more physical equipments at diverse location.
The embodiment of the present invention additionally provides a kind of computer-readable recording medium, and the computer-readable recording medium can be
The computer-readable recording medium included in memory in above-described embodiment;Can also be individualism, without supplying eventually
Computer-readable recording medium in end.The computer-readable recording medium storage has one or more than one program, and this one
Individual or more than one program is used for performing above- mentioned information search method by one or more than one processor.
Unless otherwise defined, technical term or scientific terminology used herein should be in the application art and had
The ordinary meaning that the personage of general technical ability is understood.Used in the application patent application specification and claims " the
One ", " second " and similar word are not offered as any order, quantity or importance, and are used only to distinguish different
Part.Equally, the similar word such as " one " or " one " does not indicate that quantity is limited yet, but represents there is at least one.
The similar word such as " connection " or " connected " is not limited to physics or machinery connection, but can include electrically
Connection, it is either directly or indirect.
One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can be by hardware
To complete, the hardware of correlation can also be instructed to complete by program, described program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is only the example embodiment of the application, not to limit the application, it is all in spirit herein and
Within principle, any modification, equivalent substitution and improvements made etc. should be included within the protection domain of the application.
Claims (14)
1. a kind of information retrieval method, it is characterised in that methods described includes:
Receive the problem to be solved of input;
Determine the technical field belonging to the problem to be solved;
According to the knowledge base in the technical field pre-established, determine in the technical field with the problem phase to be solved
The destination document of matching, wherein, the knowledge base include problem objects, Object of Knowledge, document object, described problem object and
The corresponding relation between corresponding relation and the Object of Knowledge and the document object between the Object of Knowledge, it is described to know
Know the part that object is selected from described problem object;
Return to the destination document.
2. according to the method described in claim 1, it is characterised in that the destination document matched with the problem to be solved
To solve the destination document of the problem to be solved;
The return destination document includes:Return to the title of the destination document and/or return in the destination document
Content.
3. according to the method described in claim 1, it is characterised in that described to determine in the technical field to be solved to ask with described
Inscribing the technical documentation matched includes:
The problem objects according to the knowledge base, the Object of Knowledge and described problem object and the Object of Knowledge it
Between corresponding relation, the problem of determining similar with the problem to be solved in the technical field;
It is determined that it is each described similar the problem of and the problem to be solved between similarity score;
Based on the similarity score, and it is each described similar the problem of corresponding destination document, it is determined that with it is described to be solved
The destination document that problem matches.
4. method according to claim 3, it is characterised in that corresponding destination document conduct the problem of each described similar
Candidate documents, it is described based on the similarity score and it is each described similar the problem of corresponding destination document, it is determined that and institute
Stating the destination document that problem to be solved matches includes:
Based on the similarity score, the problem to be solved and the similarity of each in the candidate documents are determined;
According to one or more candidates of the sequential selection of similarity from high to low between the problem to be solved and the candidate documents
Document is used as the destination document matched with the problem to be solved;
Wherein, the problem to be solved and the similarity of each in the candidate documents are determined as follows:
<mrow>
<mi>s</mi>
<mi>c</mi>
<mi>o</mi>
<mi>r</mi>
<mi>e</mi>
<mrow>
<mo>(</mo>
<mrow>
<mi>q</mi>
<mo>,</mo>
<mi>d</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>log</mi>
<mo>#</mo>
<mrow>
<mo>(</mo>
<mrow>
<mi>d</mi>
<mo>,</mo>
<mi>C</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>&times;</mo>
<munder>
<mi>&Sigma;</mi>
<mrow>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<msup>
<mi>q</mi>
<mo>&prime;</mo>
</msup>
<mi>i</mi>
</msub>
<mo>,</mo>
<mi>d</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>&Element;</mo>
<msub>
<mi>C</mi>
<mn>0</mn>
</msub>
</mrow>
</munder>
<mfrac>
<mrow>
<mo>#</mo>
<mrow>
<mo>(</mo>
<mrow>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>C</mi>
<mn>0</mn>
</msub>
</mrow>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mo>#</mo>
<mrow>
<mo>(</mo>
<mrow>
<mi>d</mi>
<mo>,</mo>
<mi>C</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>&times;</mo>
<mi>i</mi>
</mrow>
</mfrac>
<mo>&times;</mo>
<mi>s</mi>
<mi>c</mi>
<mi>o</mi>
<mi>r</mi>
<mi>e</mi>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<msup>
<mi>q</mi>
<mo>&prime;</mo>
</msup>
<mi>i</mi>
</msub>
<mo>,</mo>
<mi>q</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
Q represents problem to be solved, and d represents a candidate documents, score (q, d) represent problem q and candidate documents d to be solved it
Between similarity, # (d, C) represents the total degrees that occur in C of d, # (d, C0) represent d in C0The number of times of middle appearance, (q 'i,d)∈
C0Represent that d can be solved in C0Middle the problem of q 'i, score (q 'i, q) represent q 'iWith q similarity score;And C0Problem of representation day
Will C subset, the problem of q ' represents similar with problem q to be solved, and
C0={ (q '0, d '0), { (q '1, d '1) ..., { q 'm, d 'm), q 'iRepresent i-th it is similar with q the problem of, m is represented and q
The sum of similar the problem of, d ' represents destination document corresponding with q '.
5. according to any described methods of claim 1-4, it is characterised in that it is determined that after destination document, methods described is also
Including:
Based on Random Walk Algorithm, the problem to be solved is calculated similar to each document object in the knowledge base
Degree;
Based on the similarity of each document object in the problem to be solved and the knowledge base, to the multiple target text
Shelves are reordered.
6. method according to claim 5, it is characterised in that described to be based on Random Walk Algorithm, is calculated described to be solved
Problem and the similarity of each document object in the knowledge base include:
Select one or more nodes between the problem to be solved and the document object that index is set, wherein, the node
Index represent the node to each document object in the knowledge base similarity;
Be based upon the index that one or more described nodes are set, calculate the problem to be solved with it is every in the knowledge base
The similarity of one document object.
7. method according to claim 6, it is characterised in that selection sets the node of index to include:
Select the frequent node on path that index is set, wherein, frequent node is section of the product more than threshold value of in-degree and out-degree
Point.
8. a kind of information indexing device, it is characterised in that described information retrieval device includes:
Interface module, the problem to be solved for receiving input;
Processing module, for determining the technical field belonging to the problem to be solved;
The processing module, is additionally operable to, according to the knowledge base in the technical field pre-established, determine the technical field
In the destination document that matches with the problem to be solved, wherein, the knowledge base includes problem objects, Object of Knowledge, text
Corresponding relation and the Object of Knowledge and the document object between shelves object, described problem object and the Object of Knowledge
Between corresponding relation, the Object of Knowledge be selected from described problem object a part;
The interface module, is additionally operable to return to the destination document.
9. information indexing device according to claim 8, it is characterised in that what the described and problem to be solved matched
Destination document is the destination document for solving the problem to be solved;
The interface module specifically for:Return to the content in the title and/or the return destination document of the destination document.
10. information indexing device according to claim 8, it is characterised in that the processing module specifically for:
The problem objects according to the knowledge base, the Object of Knowledge and described problem object and the Object of Knowledge it
Between corresponding relation, the problem of determining similar with the problem to be solved in the technical field;
It is determined that it is each described similar the problem of and the problem to be solved between similarity score;
Based on the similarity score, and it is each described similar the problem of corresponding destination document, it is determined that with it is described to be solved
The destination document that problem matches.
11. information indexing device according to claim 10, it is characterised in that corresponding mesh the problem of each described similar
Mark document as candidate documents, the processing module specifically for:
Based on the similarity score, the problem to be solved and the similarity of each in the candidate documents are determined;
According to one or more candidates of the sequential selection of similarity from high to low between the problem to be solved and the candidate documents
Document is used as the destination document matched with the problem to be solved;
Wherein, the problem to be solved and the similarity of each in the candidate documents are determined as follows:
<mrow>
<mi>s</mi>
<mi>c</mi>
<mi>o</mi>
<mi>r</mi>
<mi>e</mi>
<mrow>
<mo>(</mo>
<mrow>
<mi>q</mi>
<mo>,</mo>
<mi>d</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>log</mi>
<mo>#</mo>
<mrow>
<mo>(</mo>
<mrow>
<mi>d</mi>
<mo>,</mo>
<mi>C</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>&times;</mo>
<munder>
<mi>&Sigma;</mi>
<mrow>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<msup>
<mi>q</mi>
<mo>&prime;</mo>
</msup>
<mi>i</mi>
</msub>
<mo>,</mo>
<mi>d</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>&Element;</mo>
<msub>
<mi>C</mi>
<mn>0</mn>
</msub>
</mrow>
</munder>
<mfrac>
<mrow>
<mo>#</mo>
<mrow>
<mo>(</mo>
<mrow>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>C</mi>
<mn>0</mn>
</msub>
</mrow>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mo>#</mo>
<mrow>
<mo>(</mo>
<mrow>
<mi>d</mi>
<mo>,</mo>
<mi>C</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>&times;</mo>
<mi>i</mi>
</mrow>
</mfrac>
<mo>&times;</mo>
<mi>s</mi>
<mi>c</mi>
<mi>o</mi>
<mi>r</mi>
<mi>e</mi>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<msup>
<mi>q</mi>
<mo>&prime;</mo>
</msup>
<mi>i</mi>
</msub>
<mo>,</mo>
<mi>q</mi>
</mrow>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
Q represents problem to be solved, and d represents a candidate documents, score (q, d) represent problem q and candidate documents d to be solved it
Between similarity, # (d, C) represents the total degrees that occur in C of d, # (d, C0) represent d in C0The number of times of middle appearance, (q 'i,d)∈
C0Represent that d can be solved in C0Middle the problem of q 'i, score (q 'i, q) represent q 'iWith q similarity score;And C0Problem of representation day
Will C subset, the problem of q ' represents similar with problem q to be solved, and
C0={ (q '0, d '0), { (q '1, d '1) ..., { q 'm, d 'm), q 'iRepresent i-th it is similar with q the problem of, m is represented and q
The sum of similar the problem of, d ' represents destination document corresponding with q '.
12. according to any described information indexing devices of claim 8-11, it is characterised in that it is determined that after destination document,
The processing module is additionally operable to:
Based on Random Walk Algorithm, the problem to be solved is calculated similar to each document object in the knowledge base
Degree;
Based on the similarity of each document object in the problem to be solved and the knowledge base, to the multiple target text
Shelves are reordered.
13. information indexing device according to claim 12, it is characterised in that based on Random Walk Algorithm, calculate institute
When stating the similarity of each document object in problem to be solved and the knowledge base, the processing module specifically for:
Select one or more nodes between the problem to be solved and the document object that index is set, wherein, the node
Index represent the node to each document object in the knowledge base similarity;
Be based upon the index that one or more described nodes are set, calculate the problem to be solved with it is every in the knowledge base
The similarity of one document object.
14. information indexing device according to claim 13, it is characterised in that when selection sets the node of index, institute
State processing module specifically for:
Select the frequent node on path that index is set, wherein, frequent node is section of the product more than threshold value of in-degree and out-degree
Point.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710217499.5A CN107122421A (en) | 2017-04-05 | 2017-04-05 | Information retrieval method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710217499.5A CN107122421A (en) | 2017-04-05 | 2017-04-05 | Information retrieval method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107122421A true CN107122421A (en) | 2017-09-01 |
Family
ID=59726211
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710217499.5A Pending CN107122421A (en) | 2017-04-05 | 2017-04-05 | Information retrieval method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107122421A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885842A (en) * | 2017-11-10 | 2018-04-06 | 上海智臻智能网络科技股份有限公司 | Method, apparatus, server and the storage medium of intelligent answer |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1794240A (en) * | 2006-01-09 | 2006-06-28 | 北京大学深圳研究生院 | Computer information retrieval system based on natural speech understanding and its searching method |
CN101373532A (en) * | 2008-07-10 | 2009-02-25 | 昆明理工大学 | FAQ Chinese request-answering system implementing method in tourism field |
CN102129477A (en) * | 2011-04-23 | 2011-07-20 | 山东大学 | Multimode-combined image reordering method |
CN102779182A (en) * | 2012-07-02 | 2012-11-14 | 吉林大学 | Collaborative filtering recommendation method for integrating preference relationship and trust relationship |
JP5697202B2 (en) * | 2011-03-08 | 2015-04-08 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Method, program and system for finding correspondence of terms |
CN106294505A (en) * | 2015-06-10 | 2017-01-04 | 华中师范大学 | A kind of method and apparatus feeding back answer |
CN106372087A (en) * | 2015-07-23 | 2017-02-01 | 北京大学 | Information retrieval-oriented information map generation method and dynamic updating method |
-
2017
- 2017-04-05 CN CN201710217499.5A patent/CN107122421A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1794240A (en) * | 2006-01-09 | 2006-06-28 | 北京大学深圳研究生院 | Computer information retrieval system based on natural speech understanding and its searching method |
CN101373532A (en) * | 2008-07-10 | 2009-02-25 | 昆明理工大学 | FAQ Chinese request-answering system implementing method in tourism field |
JP5697202B2 (en) * | 2011-03-08 | 2015-04-08 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Method, program and system for finding correspondence of terms |
CN102129477A (en) * | 2011-04-23 | 2011-07-20 | 山东大学 | Multimode-combined image reordering method |
CN102779182A (en) * | 2012-07-02 | 2012-11-14 | 吉林大学 | Collaborative filtering recommendation method for integrating preference relationship and trust relationship |
CN106294505A (en) * | 2015-06-10 | 2017-01-04 | 华中师范大学 | A kind of method and apparatus feeding back answer |
CN106372087A (en) * | 2015-07-23 | 2017-02-01 | 北京大学 | Information retrieval-oriented information map generation method and dynamic updating method |
Non-Patent Citations (2)
Title |
---|
SHUO YANG等: "Efficiently Answering Technical Questions - A Knowledge Graph Approach", 《PROCEEDINGS OF THE THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 * |
宋琛等: "基于随机游走相似度矩阵的改进标签传播算法", 《计算机应用与软件》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885842A (en) * | 2017-11-10 | 2018-04-06 | 上海智臻智能网络科技股份有限公司 | Method, apparatus, server and the storage medium of intelligent answer |
CN107885842B (en) * | 2017-11-10 | 2021-01-08 | 上海智臻智能网络科技股份有限公司 | Intelligent question and answer method, device, server and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110046236B (en) | Unstructured data retrieval method and device | |
WO2018049960A1 (en) | Method and apparatus for matching resource for text information | |
US10997370B2 (en) | Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time | |
Bar-Yossef et al. | Context-sensitive query auto-completion | |
CN110569496B (en) | Entity linking method, device and storage medium | |
US9342590B2 (en) | Keywords extraction and enrichment via categorization systems | |
CN109062994A (en) | Recommended method, device, computer equipment and storage medium | |
US20160012122A1 (en) | Automatically linking text to concepts in a knowledge base | |
US20120029908A1 (en) | Information processing device, related sentence providing method, and program | |
JP6124917B2 (en) | Method and apparatus for information retrieval | |
CN109948121A (en) | Article similarity method for digging, system, equipment and storage medium | |
CN103455487B (en) | The extracting method and device of a kind of search term | |
US10152478B2 (en) | Apparatus, system and method for string disambiguation and entity ranking | |
US10635733B2 (en) | Personalized user-categorized recommendations | |
JP2009093650A (en) | Selection of tag for document by paragraph analysis of document | |
US20180032608A1 (en) | Flexible summarization of textual content | |
CN112732870B (en) | Word vector based search method, device, equipment and storage medium | |
WO2020125015A1 (en) | Contextualized merchant recall | |
He et al. | Twitter summarization with social-temporal context | |
CN112487161A (en) | Enterprise demand oriented expert recommendation method, device, medium and equipment | |
CN107765883A (en) | The sort method and sequencing equipment of candidate's word of input method | |
US20220164546A1 (en) | Machine Learning Systems and Methods for Many-Hop Fact Extraction and Claim Verification | |
CN112988971A (en) | Word vector-based search method, terminal, server and storage medium | |
CN107122421A (en) | Information retrieval method and device | |
Gupta et al. | Songs recommendation using context-based semantic similarity between lyrics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170901 |