CN109033314A

CN109033314A - The Query method in real time and system of extensive knowledge mapping in the case of memory-limited

Info

Publication number: CN109033314A
Application number: CN201810787762.9A
Authority: CN
Inventors: 王宏志; 万晓珑; 高宏
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2018-12-18
Anticipated expiration: 2038-07-18
Also published as: CN109033314B

Abstract

The present invention relates to technical field of data processing, the Query method in real time and system of the extensive knowledge mapping in the case of a kind of memory-limited are provided, this method comprises: carrying out processing analysis to original knowledge map obtains inverted file Hash list；It is indexed based on original knowledge map construction multilevel structure；Query statement is parsed to obtain target vocabulary, and the corresponding triple of the target vocabulary is searched according to the inverted file Hash list and multilevel structure index and generates result subgraph.The present invention greatly improves single machine knowledge mapping query capability, can provide the result set for not only meeting user time demand but also meeting user's accuracy requirement in the case where memory is extremely limited.

Description

The Query method in real time and system of extensive knowledge mapping in the case of memory-limited

Technical field

The present invention relates to the extensive knowledge mappings in the case of technical field of data processing more particularly to a kind of memory-limited Query method in real time and system.

Background technique

WWW has formd a huge network from being born till now, and constitute its node is net one by one Page, it is interrelated by hyperlink between webpage.Based on this simply open technology in WWW, modern search engines technology can To search the related web page of problem in huge cyberspace.But due to the development of mobile Internet, mobile device screen Space limitation, user it is expected that search engine is available accurately as a result, rather than finding one by one in search result.Due to This accuracy requirement at family is only that the storage of webpage cannot meet.

In order to solve this demand XML (extensible markup language), RDF (resource description framework) and OWL (Network ontology Language) etc. be proposed for description network in information.XML is by adding label for document and data content, in order to data Exchange；RDF describes the semantic relation of resources in network by the form of (subject, predicate, object) triple；OWL, which allows, describes this The conception of species is possibly realized, and has extremely strong ability to express and interpretability.Pass through three of the above internet information describing mode The concept of knowledge mapping is suggested in recent years.Entity and entity attribute in webpage are put into knowledge mapping after being identified and deposit Storage can prepare to understand that user is intended to according to node known in knowledge mapping, provide and accurately return when user initiates to search for It answers.

Have at present in the main storage querying method of the knowledge mapping based on RDF triple form: huge based on one Triple table divides table by vertical classification by hierarchical cluster attribute table and based on multiple based on multiple.Based on a huge triple The form of table is by all triple stores in a huge three lists lattice, and Major Systems in this way have: RDF-3x and Hexastore；There are two types of major type of tables for form based on multiple tables by hierarchical cluster attribute: tuple attributes are poly- The table of class table and the object with like attribute；Based on forms such as multiple tables divided by vertical classification to each attribute Construct an individual 2 list lattice.For storing subject and object.RDF storage system based on above-mentioned three kinds of forms have Jena, Yars2, Sesame 2.0, SW-store, EDF-3x, x-RDF-3x, Hexastore, gStore etc..

Existing RDF storage inquiry system such as Jena, Yars2 and Sesame 2.0 is imitated on biggish RDF data collection Fruit is poor.And SW-store, EDF-3x, x-RDF-3x and Hexastore by using mapping dictionary mode solve compared with The problem of big RDF data collection, it can only but support fixed SparQL language.And most of current method cannot be quick Solve the problems, such as RDF data online updating.Such as the system Jena based on multiple forms by hierarchical cluster attribute table, if will be at it The attribute information of more new data then needs to cluster and rebuild again attribute list on data set.In SW-store system due to Update needs to rewrite many column, and it is also fairly expensive for updating cost.Although having used the mode of " overflow table+write in batches " Also it is difficult to be required the high application use of real-time.And much RDF datas are intended to non-critical structural, such as same It is not attribute all having the same in the data of type.It is this non-critical structural, be conducive to the integrated of data but for Many classics accelerate aggregation of data query processing with relationship type method.Although gStore is solved using the method for T-index Part above problem, but single machine supports data set limited size in T-index structure, and 1,000,000,000 triples can only be supported to advise The data administration tasks of the RDF knowledge mapping of mould.

However as human knowledge update become larger, knowledge mapping scale is also accordingly increasing, size far more than 1000000000 tuples.The common computing capability for calculating equipment does not catch up with knowledge mapping rate of rise far but, and ordinary user looks on it It is more and more difficult to ask processing.Such as freebase about 380G, there are 8G or so in ordinary user at present, and average PC user is on it Directly a large amount of I/O operation will be generated by doing inquiry, greatly waste user time.However most of ordinary users do not need ten Divide accurate result, it is only necessary to which polling routine provides approximate solution.It is more and more with the rise of Approximate query processing technology Result of study show: in most cases approximation can meet user demand, and can largely save user calculate when Between, reduce the requirement to equipment is calculated.

Summary of the invention

The technical problem to be solved in the present invention is that being provided for above one or more defects in the prior art The Query method in real time and system of a kind of extensive knowledge mapping in the case of memory-limited.

In order to solve the above-mentioned technical problems, the present invention provides the real-time of the extensive knowledge mapping in the case of memory-limited Querying method, comprising:

Processing analysis is carried out to original knowledge map and obtains inverted file Hash list；

It is indexed based on original knowledge map construction multilevel structure；

Query statement is parsed to obtain target vocabulary, and according to the inverted file Hash list and multilevel structure rope Draw and searches the corresponding triple generation result subgraph of the target vocabulary.Optionally, described that processing point is carried out to original knowledge map Analysis obtains inverted file Hash list, comprising:

Extract the tuple information of the offset form again of first vocabulary in original knowledge map；

The tuple information of extraction is converted into first vocabulary offset form again；

The tuple information of first vocabulary offset form again is ranked up according to vocabulary, obtains inverted file；

Hash processing is carried out to obtained inverted file, obtains inverted file Hash list.

It is optionally, described to be indexed based on original knowledge map construction multilevel structure, comprising:

The preliminary structure discovery result of original knowledge map is carried out data classification, cleaning and simplifies data to indicate to obtain Knowledge mapping data classification simplifies result；

Knowledge based spectrum data classification eases result extracts fabric node；

Simplify result to the knowledge mapping data classification further to extract, realizes higher level's configuration index.

Optionally, described that query statement is parsed to obtain target vocabulary, and according to the inverted file Hash list It is indexed with multilevel structure and searches the step of corresponding triple of the target vocabulary generates result subgraph and include:

The query statement Q of user's input is received, tuple number lower limit min is returned, returns to tuple number upper limit max and pumping Sample ratio δ；

Query statement Q is parsed, the word finder for needing to inquire is obtained；

To each vocabulary in word finder, corresponding disk indexed set { S is found parallel in inverted file Hash list₁, S₂... ..., S_n, and disk index intersection S is obtained after seeking intersection；

Judge whether the length of disk index intersection S is less than and return to tuple number lower limit min:

It is that then any index position in disk index intersection S is saved using the index and its location information as one Point is added in result subgraph；

It otherwise, is to enable sampling when judging whether the length of disk index intersection S is greater than return tuple number upper limit max Quantity is max, and otherwise enabling sample size is that disk indexes the length of intersection S and the product of sampling ratio δ, and if the sampling When quantity is less than return tuple number lower limit min, enabling sample size is tuple number lower limit min；It is right after determining sample size Disk indexes intersection S and carries out semi-random sampling, wherein the auxiliary sampling node superNode for needing to obtain using step S102 Information.Each index that sampling obtains is added in structure subgraph in multilevel structure index and its location information.

The present invention also provides a kind of real time inquiry systems of the extensive knowledge mapping in the case of memory-limited, comprising: Unit, multiple index construction unit and search unit are established in Hash list；

Unit is established in the Hash list, obtains inverted file Hash column for carrying out processing analysis to original knowledge map Table；

The multiple index construction unit, for being indexed based on original knowledge map construction multilevel structure；；

The query unit obtains target vocabulary for being parsed to query statement, and is breathed out according to the inverted file Uncommon list and multilevel structure index search the corresponding triple of the target vocabulary and generate result subgraph.

Optionally, the Hash list establishes unit for executing following steps:

Optionally, the multiple index construction unit is for executing following steps:

Simplify result to the knowledge mapping data classification further to extract, realizes higher level's configuration index.Optionally, described Query unit is for executing following steps:

Implement the extensive knowledge mapping in the case of memory-limited provided in an embodiment of the present invention Query method in real time and System at least has the following beneficial effects:

1, the present invention can take into account relationship between the demand and UE capability of user, by inverted index and Configuration index improves user's single machine data-handling capacity, and the result set of user can be found within the very fast time.

2, the present invention is further by fusion Approximate query processing technology, using the thought in Approximate query processing field, Subgraph structure is extracted after obtaining the extensive result set that user specifies.Both the query time for having saved user reduces memory sky Between restriction for query engine, and can return to a user according to user intention can be with the result of fast understanding.

Detailed description of the invention

Fig. 1 is the Query method in real time of the extensive knowledge mapping in the case of the provided memory-limited of the embodiment of the present invention one Flow chart；

Fig. 2 is according to the principle of the present invention schematic diagram；

Fig. 3 a, 3b and 3c be respectively fabric schematic diagram, bottom layer node and the relation schematic diagram extracted of the present invention and on Node layer and relation schematic diagram；

Fig. 4 is the real time inquiry system of the extensive knowledge mapping in the case of the provided memory-limited of the embodiment of the present invention five Schematic diagram；

In figure: 401: unit is established in Hash list；402: multiple index construction unit；403: searching unit.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiments of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Embodiment one

Referring to Fig. 1, for according to the extensive knowledge mapping in the case of the provided memory-limited of the embodiment of the present invention one The flow chart of Query method in real time；Fig. 2 is according to the principle of the present invention schematic diagram.As shown, provided in an embodiment of the present invention The Query method in real time of extensive knowledge mapping in the case of memory-limited, may comprise steps of:

Step S101: executing inverted file Hash list establishment step, i.e., carries out processing to original knowledge map and analyze To inverted file Hash list.Since lexical repetition rate is higher in the ultra-large knowledge mapping of nonumeric type, the row's of falling text is used Part may be implemented to position triple rapidly according to vocabulary, and it is fast in order to accelerate vocabulary to search that inverted file, which is carried out Hash processing, Degree reduces file I/O operation.

Step S102: executing multilevel structure index construct step, is based on original knowledge map construction multilevel structure rope.

Step S103: in inquiry, according to the inverted file Hash list, multilevel structure index and original knowledge map Sentence is inquired.

Key point of the invention is to realize that the query time of average PC user and precision need using the memory headroom of very little It asks, it may be assumed that the knowledge mapping real-time query in memory limited space avoids generating when user memory space is smaller a large amount of I/O operation, cause cpu busy percentage not high, it is time-consuming excessive to read file, the extremely long situation of period of reservation of number.

The present invention is for the current knowledge mapping based on RDF structure, the method for taking structure extraction, by data vertex Layered shaping is carried out, thus the vertex structure being simplified.It joined hash data structure in inverted file design, may be implemented to look into Tuple is looked for carry out within O (1) time.Two kinds of structures are combined, the result set of user can be found in O (1) time.It is close by merging It is extracted after obtaining the extensive result set that user specifies like Query Processing Technique using the thought in Approximate query processing field Subgraph structure.Both the query time for having saved user reduces restriction of the memory headroom for query engine, and can be according to user Wish returns to a user can be with the result of fast understanding.

The present invention overcomes the difficult points that knowledge mapping structure is extracted on non-critical structural knowledge map, and are counting greatly According to the time complexity that collection is operated, it can guarantee shorter off-line data processing time and on-line search time.

Embodiment two

On the basis of the Query method in real time of extensive knowledge mapping in one provided memory-limited of embodiment, Processing analysis is carried out to original knowledge map in step S101 and obtains the process of inverted file Hash list, it specifically can be by such as Under type is realized:

Step 1: extracting the tuple information of the offset form again of first vocabulary in original knowledge map.Elder generation's vocabulary offset form again Refer to the form of (offset, vocabulary ... ..., vocabulary), i.e., extracts (offset, word from original knowledge map in the step 1 Converge ... ..., vocabulary) form tuple information.

Step 2: the tuple information of extraction is converted into first vocabulary offset form again.Elder generation's vocabulary offset shape again Formula refers to the form of (vocabulary, offset ... ..., offset), i.e., by (offset, vocabulary ... ..., vocabulary) shape in the step 2 The tuple information of formula switchs to the form of (vocabulary, offset ... ..., offset).

Step 3: to first vocabulary, offset form is the tuple information of (vocabulary, offset ... ..., offset) according to word again Remittance is ranked up, and obtains inverted file；

The step 3 includes:

Step 3.1: merging the offset information of repeated vocabulary between adjacent 100,000 tuple；

Step 3.2: memory order is carried out as unit of 100,000；

Step 3.3: sorting to file merger obtained above；

Step 3.4: (vocabulary, offset ... ..., offset) tuple after being sorted.

Step 4: Hash processing being carried out to obtained inverted file, inverted file Hash list is obtained, to improve subsequent look into Look for efficiency.

Shown in the following algorithm 1 of algorithm for constructing inverted file Hash list section, 1-11 row corresponds to abovementioned steps 1 to step Rapid 3.Wherein, 1-7 row is the process that (v, p ..., p) tuple is extracted from the i.e. extensive knowledge mapping G of original knowledge map, often The quantity for the tuple extracted in a inverted file is no more than preset quantity Max, and executes " list.addAndSort (extract (triple)) " when the tuple of extraction being added to inventory list, needs to turn (v, p ..., p) tuple form Be changed to (p, v, ..., v) form, and be ranked up according to vocabulary therein.It can obtain one in Max range intervals A result set to have sorted exports in file up to inverted file.8-11 row, our obtained rows of available previous step The number of the good inverted file of sequence.12-18 row can be fallen by selection hash function and all inverted files of merging Arrange file Hash list fileList.

Embodiment three

On the basis of the Query method in real time of extensive knowledge mapping in two provided memory-limited of embodiment, Process based on original knowledge map construction multilevel structure index in step S102, can specifically be accomplished in that

The present invention carries out the isolated preliminary structure of body layer to original knowledge map and finds, then carries out multilevel index structure Building, comprising: knowledge mapping constructional depth analysis, knowledge mapping memory node index establish and overall structure index establish Three parts.

(1) knowledge mapping constructional depth is analyzed: carrying out data classification, cleaning to the preliminary structure discovery result of knowledge mapping And simplified data indicate to obtain the simplified result of knowledge mapping data classification；Wherein data reduction indicates, is for original RDF Knowledge mapping is converted.It is here to leave out original knowledge map that original knowledge mapping, which has many redundancies, Redundancy.

(2) knowledge mapping memory node index is established: extracting original knowledge map, (RDF triple is in principle according to subject Same position is adjacent to be stored in disk) in the Disk Locality that first appears of vertex, using quicksort method by the disk The tuple of position is ranked up to obtain knowledge mapping memory node index according to the size relation between node and node；The node That is Disk Locality.

(3) overall structure index is established: it is further to simplify result progress Disk Locality to the knowledge mapping data classification It extracts, realizes higher level's configuration index, then organically combine to obtain by knowledge mapping memory node index and higher level's configuration index more Level structure index.

Basic Ontological concept is possessed by the knowledge mapping that Ontology Language development comes, the collection including real world objects The set of relationship between conjunction and real world objects.This knowledge mapping can easily be divided into ontology (concept) layer And true (object) layer.Obviously, body layer possesses many examples in true layer in extensive knowledge mapping.Utilize this The one characteristic present invention can easily extract the body layer of knowledge mapping using data mining technology, and then separate its body layer With true layer, the building of multilayered structure index of the invention is completed.The present invention can be used bottom-up method and realize knowledge The AUTOMATIC ZONING of map.Certainly critical step: knowledge mapping cleaning operation is done before layering, using certain Coding rule reduces the redundancy in knowledge mapping, and at the same time, the present invention extracts the leaf section in knowledge mapping simultaneously Point and their Disk Locality information, the fabric as multiple index.Then it goes to extract bottom using these bottom layer nodes Relation information and upper layer node information between node layer.Further separation knowledge mapping.For example, what the present invention obtained Fabric is as shown in Figure 3a, next layer circulation in by obtain this level node relationships information (as shown in Figure 3b) and on One layer of nodal information and upper and lower level node relationships information (as shown in Figure 3c).

In one embodiment of the invention, the building process of above-mentioned multilevel structure index can specifically include following step It is rapid:

Step 1: extracting the fabric node of extensive knowledge mapping G, specifically include: for extensive knowledge mapping G In each triple traversed, judge whether the object of the triple is leaf node, is the subject then by the triple And location information is added to set N₀In, and multilayer knot is added using the subject of the triple and location information as a node In structure index；Otherwise set N is added in the subject of the triple and location information₁In.

Step 2: constructing the incidence relation information of the upper layer node index and current Hierarchy nodes of current Hierarchy nodes, specifically Are as follows:

Detect set N₁When not being empty set, set S is enabled₀=N₀, S₁=N₁, by set N₀With set N₁It is set to empty set；For Set S₁Each of (triple, position) traversed, for current (triple, position):

If the object of the triple is in set S₀In and subject not in set S₀In, then extract the following letter of the triple It ceases (triple subject, position) and set N is added₀In, and extract following information (triple subject, position, the collection of the triple Close S₀In the triple object position) be added multilevel structure index in；

If the object of the triple is in set S₀In and subject in set S₀In, then extract the following information of the triple (set S₀In the triple subject position, set S₀In the triple object position) be added multilevel structure index in；

Otherwise, set N is added in (triple, the position) of the triple₁In；

Step 3: extracting the higher-level node (high-level nodal information) in multilevel structure index.

Following algorithm 2 is detailed to illustrate the knowledge mapping level method for digging extraction multilevel structure for how passing through automation Index.The 1-7 row of algorithm is extracted the fabric node of extensive knowledge mapping G.Algorithm is gradual in following circulation Construct configuration index.The upper layer node index of present node level is constructed by 11-13 row and two-layer node index closes System.The incidence relation information of current Hierarchy nodes is constructed by the 14th, 15 rows.Note that in order to establish level index and the row's of falling text Part breathes out the incidence relation between series of tables, and the two is all the form memory node using key-value pair, and " key " is each node Position in disk, " value " are the information needed in our various algorithms.Above-mentioned process is loop structure, N₀Represent extraction Lower level node out, N₁Indicate the upper layer node extracted.And S is assigned in second of circulation₀S₁.Finally, 18 rows, it would be desirable to higher-level node (high-level nodal information) superNode be extracted according to obtained configuration index, for me Subsequent searching algorithm service.

Example IV

On the basis of the Query method in real time of extensive knowledge mapping in three provided memory-limited of embodiment, Query statement is parsed in step S103 to obtain target vocabulary, and according to the inverted file Hash list and multilevel structure Index searches the process that the corresponding triple of the target vocabulary generates result subgraph, can specifically be accomplished in that

Step 1: receiving the query statement Q of user's input, return to tuple number lower limit min, return to tuple number upper limit max And sampling ratio δ；

Step 2: parsing query statement Q obtains the word finder for needing to inquire；

Step 3: to each vocabulary in word finder, finding corresponding magnetic parallel in inverted file Hash list fileList Disk indexed set { S₁, S₂... ..., S_n, and disk index intersection S is obtained after seeking intersection；Wherein n is the number of vocabulary in word finder Amount.

Step 4: judge whether the length of disk index intersection S is less than and returns to tuple number lower limit min:

The present invention uses the operation result of step S101 and step S102, is search service.It is arranged using inverted file Hash Table finds the tuple position that user query wish to find, and finds its adjacent vertex structure using multilayer index, realizes in memory Rapid structural inquiry under limited situation.But it is only also far from enough using only obtained result, in search process still How there are many problems solves what inverted file Hash list obtained for example, whether the inquiry of user's input is accurately to inquire The huge situation of result set.There is no the inquiries to user's input to limit by the present invention, exactly this unrestricted inquiry Resulting in inquiry, there may be inaccurate situations.Without accurately inquiring widely distributed, this result set that will lead to result set Widely distributed situation can be used for the list of inverted file Hash and multiple index, user memory even if the present invention A possibility that in the presence of query result can not be handled.For extensive knowledge mapping, it is assumed that the inquiry of user's input is high precision , an example in upper layer node vocabulary ' Award (prize-winning) ' such as " Award winner (award-winner) " is inquired, that The present invention is bound to provide an accurate perfect result in the efficient time.But if the user desired that check vocabulary The case where ' Award '? even if the result (user memory result to be treated) that we return in this case is not related to When neighbor information content, size still times over even it is several decuple user be provided to searching algorithm memory it is big It is small, for the describe in SPARQL sentence just less with mentioning.Moreover, such case appears in user query sentence Frequency be again it is especially high, in the case where user has little understanding to inquiry content, usable means are exactly from macroscopic view to micro- That sees inquires knowledge mapping, is exactly that user in most cases cannot provide one and accurately look into briefly Ask sentence.So how to solve the problems, such as that this non-precision query statement causes to realize efficient inquiry in this case Become the querying method of the present invention main problem to be overcome.

In order to solve the problems, such as those discussed above, the accuracy and query time demand of balancing user inquiry, this hair The bright thought for combining some Approximate query processings is in searching method, it may be assumed that can be to one and half accurate results to use when search Family.From the point of view of a certain angle, the online query in searching algorithm and Approximate query processing of the invention is very close, still, In Approximate query processing system, since the inquiry of user is towards entire data set, user needs nomination sample ratio.Every time When inquiry, the methods of sampling is pushed away down in any case, correct query statement, in fact have in Approximate query processing system It is operated using sampler.Wherein precision guarantee shows the difficult shape that becomes increasingly complex with gradually pushing away down for sampler Condition.

Since there are inverted file Hash list structure, not all inquiry requires subsampling operation in the present invention , this undoubtedly ensure that the absolute accuracy of a part inquiry.And when user carries out fuzzy query, we provide a knots The big minizone of fruit map space ([Max, Min]) and desired sampling ratio (E δ) variable transfer to user specified, Yi Jiyi A semi-random sampler provides precision guarantee.Obviously, when us, the result set obtained in the inverted file Hash list is slight greatly We do not need to be sampled processing to obtained result when being equal to Min, the ternary that we will directly by inquiring Progress synthon graph structure in group position is given user and is checked.And when obtained size is more than Max, we will pass through sampling Ratio is that the result set that the sampling rate of Max ÷ length (results) size is given carries out semi-randomization sample process.Work as result set When between the section that user specifies, we can carry out half to result set using the desired sampling ratio E δ of user first Random sampling, it is Min ÷ length that we, which will do it practical sampling ratio, when sampling results size is less than Min (results) it is sampled, if result set, in interval range, practical sampling ratio A δ is equal to the expected sampling ratio E of user δ.It can be seen that the result set magnitude range [Min, Max] that user specifies be it is absolute, algorithm can strictly defer to user and specify Interval range works.But the expectation sampling ratio E δ that user specifies is to change according to the actual situation, last algorithm Practical sampling ratio A δ can be returned.In addition, one is worth the thing of explanation to be that precision guarantee is very in Approximate query processing An important measurement dimension.The present invention guarantees our result precision using semi-random sampling function.It is so-called semi-random, just It is that aforementioned obtained superNode is utilized to retain upper layer node in sampling process.

The pseudocode of the specific implementation of step S103 is as shown in following algorithm 3.In the 1st row, the inquiry that user is inputted Sentence is parsed, in order to find inquiry target vocabulary.Then, it from 2-6 row, is breathed out using the inverted file that algorithm 2 obtains Uncommon list and target vocabulary obtained in the previous step position the triple that all user query are related to, and find distribution of results situation. Since user is not aware that whether the query statement that he specifies is accurate, the big of Accurate Prediction query result of also having no idea It is small, in order to guarantee result set be sized for user memory operation and guarantee implement search efficiency, each time inquire before The present invention claims users to give result set magnitude range [Min, Max] and the desired sampling ratio E δ of user.Therefore, the 7th Row, it would be desirable to which it is [Min, Max] that result subset magnitude range, which is arranged,.In addition, the result obtained according to us by inverted index Distributing position and result set size decide whether sampling and sample mode in the 9th row.Followed by row 10-11 and 20- 21, construct subgraph structure.That be worth explaining is G^*A kind of structure of adjust automatically subgraph structure, in one new section of addition every time While point enters, G^*Adjust automatically result set being indexed according to level, furthermore multiple index is deposited according to key value structure Storage, it means that the time complexity for extracting multilevel hierarchy index is (1) O, so constructing subgraph knot within O (1) time Structure G^*It is obviously feasible.

Embodiment five

As shown in figure 4, the real-time of extensive knowledge mapping in the case of the memory-limited that the embodiment of the present invention five provides is looked into Inquiry system may include: that unit 401, grade index construct unit 402 and query unit 403 are established in Hash list；

Unit 401 is established in Hash list, obtains inverted file Hash column for carrying out processing analysis to original knowledge map Table.The operation that the execution of unit 401 is established in the Hash list is identical as step S101 in preceding method.

Multiple index construction unit 402, for being indexed based on original knowledge map construction multilevel structure.The multiple index structure The operation for building the execution of unit 402 is identical as step S102 in preceding method.

Query unit 403 obtains target vocabulary for being parsed to query statement, and according to the inverted file Hash List and multilevel structure index search the corresponding triple of the target vocabulary and generate result subgraph.What the query unit 403 executed It operates identical as step S103 in preceding method.

Preferably, Hash list establishes unit 401 for executing following steps:

Preferably, multiple index construction unit 402 is for executing following steps:

Preferably, query unit 403 is for executing following steps:

To each vocabulary in word finder, corresponding disk indexed set { S is found parallel in inverted file Hash list D₁, S₂... ..., S_n, and disk index intersection S is obtained after seeking intersection；

It is further to note that the reality of the extensive knowledge mapping in the case of memory-limited provided in an embodiment of the present invention When inquiry system, can also be realized by way of hardware or software and hardware combining by software realization.It is implemented in software For, it is by the CPU of equipment where it by nonvolatile memory as shown in figure 4, as the system on a logical meaning In corresponding computer program instructions be read into memory operation formed.

In conclusion compared with prior art, the present invention greatly improves single machine knowledge mapping query capability, it can The result set for not only meeting user time demand but also meeting user's accuracy requirement is provided in the case where memory is extremely limited.It is existing Knowledge mapping inquiry system is to provide based on complete query processing ability, in the case of having ignored current this knowledge huge explosion The demand that personal user inquires knowledge mapping consumes the result that a large amount of memory headroom is found and has also exceeded ordinary user's Data understandability.

The present invention can take into account the relationship between the demand and UE capability of user, pass through inverted index and knot Structure index improves user's single machine data-handling capacity, by Approximate query processing technology and automation for knowing on a large scale The Structure Understanding for knowing map, provides the user with a suitable result set.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement；And these are modified or replaceed, not Depart from the spirit and scope of the technical scheme of various embodiments of the present invention the essence of corresponding technical solution.

Claims

1. a kind of Query method in real time of the extensive knowledge mapping in the case of memory-limited characterized by comprising

Query statement is parsed to obtain target vocabulary, and is looked into according to the inverted file Hash list and multilevel structure index The corresponding triple of the target vocabulary is looked for generate result subgraph.

2. the method according to claim 1, wherein it is described to original knowledge map carry out processing analysis fallen Arrange the list of file Hash, comprising:

3. the method according to claim 1, wherein described be based on original knowledge map construction multilevel structure rope Draw, comprising:

The preliminary structure discovery result of original knowledge map is carried out data classification, cleaning and simplifies data to indicate to obtain knowledge Spectrum data classification eases result；

4. method described in any one of claim 1 to 3, which is characterized in that described parse to query statement The corresponding triple of the target vocabulary is searched to target vocabulary, and according to the inverted file Hash list and multilevel structure index The step of generating result subgraph, comprising:

The query statement Q of user's input is received, tuple number lower limit min is returned, returns to tuple number upper limit max and sampling fraction Rate δ；

It is that then any index position in disk index intersection S is added using the index and its location information as a node Enter in result subgraph；

It otherwise, is to enable sample size when judging whether the length of disk index intersection S is greater than return tuple number upper limit max For max, otherwise enabling sample size is that disk indexes the length of intersection S and the product of sampling ratio δ, and if the sample size When less than returning to tuple number lower limit min, enabling sample size is tuple number lower limit min；To disk after determining sample size It indexes intersection S and carries out semi-random sampling, each index that sampling obtains is added in multilevel structure index and its location information In structure subgraph.

5. a kind of real time inquiry system of the extensive knowledge mapping in the case of memory-limited characterized by comprising Hash column Table establishes unit, multiple index construction unit and search unit；

Unit is established in the Hash list, obtains inverted file Hash list for carrying out processing analysis to original knowledge map；

The multiple index construction unit, for being indexed based on original knowledge map construction multilevel structure；

The query unit obtains target vocabulary for being parsed to query statement, and is arranged according to the inverted file Hash Table and multilevel structure index search the corresponding triple of the target vocabulary and generate result subgraph.

6. system according to claim 5, which is characterized in that the Hash list establishes unit for executing following step It is rapid:

7. system according to claim 5, which is characterized in that the multiple index construction unit is for executing following step It is rapid:

8. the system according to any one of claim 5~7, which is characterized in that the query unit is following for executing Step: