CN102999563A - Network resource semantic retrieval method and system based on resource description framework - Google Patents

Network resource semantic retrieval method and system based on resource description framework Download PDF

Info

Publication number
CN102999563A
CN102999563A CN2012104339379A CN201210433937A CN102999563A CN 102999563 A CN102999563 A CN 102999563A CN 2012104339379 A CN2012104339379 A CN 2012104339379A CN 201210433937 A CN201210433937 A CN 201210433937A CN 102999563 A CN102999563 A CN 102999563A
Authority
CN
China
Prior art keywords
rdf
resource
data
relational database
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012104339379A
Other languages
Chinese (zh)
Inventor
黎明
吴少智
陈佳
吴跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUXI UESTC TECHNOLOGY DEVELOPMENT Co Ltd
Original Assignee
WUXI UESTC TECHNOLOGY DEVELOPMENT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUXI UESTC TECHNOLOGY DEVELOPMENT Co Ltd filed Critical WUXI UESTC TECHNOLOGY DEVELOPMENT Co Ltd
Priority to CN2012104339379A priority Critical patent/CN102999563A/en
Publication of CN102999563A publication Critical patent/CN102999563A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a network resource semantic retrieval method and a network resource semantic retrieval system based on a resource description framework. A resource description method provided by using the resource description framework is used for carrying out data modeling on a heterogeneous resource on a Web; uniform description based on RDF (Radio Direction Finding) is carried out according to models so as to support information retrieval based on semantics in a querying process, so that physically-loosened resources in a previous system are effectively and logically collected together by the uniform description based on the RDF, web resources are effectively utilized and the target of sharing the resources on different platforms is realized. A resource semantic related base is introduced to store resources with the relevancy on the aspect of the semantics and the dynamic change of the Web resources is successfully processed, so that the recall ratio and the precision ratio are guaranteed. A dynamic sieving algorithm based on a bayesian decision theory is adopted to ensure that a usually-queried object is placed into a high-speed buffering storage and the time of returning back to a queried result is effectively shortened.

Description

Internet resources semantic retrieving method and system based on resource description framework
Technical field
The present invention relates to technical field of information retrieval, relate in particular to a kind of Internet resources semantic retrieving method and system based on resource description framework.
Background technology
Along with the acceleration of global IT application process, the information on the network is more and more, and is also more and more higher to the requirement of information retrieval method.The most search method all is based on the global search technology of keyword match, phenomenon incomplete, that give an irrelevant answer that retrieving often appears in Query Result.Semantic retrieval has overcome the shortcoming that the mechanical type characters matching is confined to format surface just, analyze retrieval request with process user from the expressed semantic hierarchies of network (Web) resource information, information retrieval from bringing up to based on knowledge (or concept) aspect based on the keyword aspect at present, there are certain understanding and processing power to knowledge.In addition, the present search engine of more existing semantic-baseds retrievals, but this type of technology all also rests on the aspect of processing static information basically, and this can not satisfy its retrieval requirement fully concerning Protean Web resource at any time.
The retrieval technique of semantic-based has represented a new direction of search engine development, and the advantages and function of its semantic-based also can progressively manifest from now on, and affects effectively people's work, studying and living.As far back as last century the eighties the discussion of semantic retrieval is just appeared in international information retrieval conference (SIGIR) meeting paper, but semantic retrieval research is limited by the limitation of Semantic Information Processing development level all the time.Along with the development of natural language processing, artificial intelligence, the especially rise of semantic network technology and development, semantic retrieval research was developed rapidly since last century end.Although up to the present to semantic retrieval conceptive still ununified defining, different researchs have something in common, just are based on the retrieval higher to the semantic processes implementation efficiency of Web resource.The extraction of semantic information and processing can be based on semantic net method and technology, also can be based on natural language processing technique.At present, the former is relatively more general in semantic retrieval research.In fact, just because of appearance and the development of semantic net, just make the research of semantic retrieval more be able to clear and definite also development so rapidly.
At present, information retrieval and the correlation theory of semantic-based have been done certain research abroad.The comparatively outstanding achievement in research in conceptual retrieval field in recent years, developed topic concept space (ITO Space) and concept map (ITO Map) thereof based on the science and techniques of defence project research report summary info of U.S. national defense advanced studies infotech office of administration (ITO) such as U.S. University of Illinois and University of Arizona, and based on cancer concept space (Cancer Space) and the cancer concept map (Cancer Map) thereof of american cancer medical data base.Domestic research mainly concentrates on the feature of semantic-based search engine and the discussion of architectural model, and based on the implementation method of conceptual retrieval, adopts the method for " Net to Net " to come the real concept retrieval such as Tang Peili with regard to proposing.Present existing search engine also is far from being able to analyze as the people and the level of understanding the natural language semanteme, and does not also reach such level within from now on short-term.Although abroad there are some companies to make product based on concept, only accomplished pragmatics level, semantic level not yet relates to.And for Chinese search engine, because the problem of Chinese language processing aspect is arranged, the work of this respect just just begins.In addition, current retrieval technique for Semantic Web mainly concentrates on the description to static information, and fails to consider the processing to content dynamic and that constantly change.Therefore, how to realize semantic query at the relevant transaction of a series of semantemes, also become a new direction of semantic-based inquiry research.Just because of in people's real life in the urgent need to the appearance of the retrieval technique of semantic-based, and present technology also can not satisfy people's needs far away.
Summary of the invention
For above-mentioned technical matters, the object of the present invention is to provide a kind of Internet resources semantic retrieving method and system based on resource description framework, it utilizes resource description framework (Resource Description Framework, the correlation techniques such as the resource description that RDF) provides, the web resource retrieval of the semantic-based when not only being supported in inquiry, can process the dynamic change of the upper resource of Web, and can guarantee the object that often is queried is put into cache memory (Cache), guaranteed recall ratio, precision ratio has shortened the time of returning Query Result effectively.
For reaching this purpose, the present invention by the following technical solutions:
A kind of Internet resources semantic retrieving method based on resource description framework comprises the steps:
A, the heterogeneous resource on the Web is carried out data modeling, and based on resource description framework (RDF) the Web resource is described, generate the RDF data;
B, with described RDF data with the form of the tlv triple record as relational database, finish the data storage to RDF;
C, utilize Bayes (Bayes) decision theory that the object in the cache memory (Cache) is dynamically screened;
D, user submit the RDF query requests to, and described RDF query requests is converted to the manageable SQL statement of relational database;
E, by described SQL statement Cache is inquired about, if inquire desired data, then described data are returned to the user, if do not inquire desired data, then direct and relational database engine communicates, and obtains described data from relational database, and it is returned to the user;
The semantic related libraries of the resource that F, foundation distribute is stored in the web resource that semantically has the degree of correlation; When receiving the RDF request that the user submits to, relational database engine is at first inquired about in the semantic related libraries of resource, if inquire desired data, then described data are returned to the user, if do not inquire desired data, then from relational database, obtain described data, and it is returned to the user.
Especially, described steps A specifically comprises:
A1, carry out data modeling with the original heterogeneous resource among the Web as data source, be used for setting up new type, and the attribute of type defined, if the described original heterogeneous resource in the system is with the form tissue of file, then take file as data source, if described original heterogeneous resource is with the form tissue of database, then with the data in the database as data source;
The self-defining vocabulary of data modeling process in the vocabulary that A2, application RDF carry and the steps A 1 is described the Web resource, generates the RDF file with extend markup language (XML) form tissue.
Especially, described step B specifically comprises:
Set up resource description framework pattern (RDF Schema, RDFS) with the corresponding relation of Entity-Relationship Model (E-R model), convert RDFS to the E-R model, according to this E-R model opening relationships database, the RDF data with the form of the tlv triple record as relational database, are finished the data storage to RDF.
Especially, described step C specifically comprises:
Based on Bayesian decision theory, utilize characteristic attribute that principle of maximum entropy selects as attribute, and the RDF that will be queried record construct optimizer as training dataset; Described optimizer calculated described RDF record in the free time of inquiry, drew its invoked posterior probability, selected the highest RDF object of described invoked posterior probability and called among the cache; When the cache capacity is expired, calculate that the RDF object will be queried the probability that hits among the cache, and the RDF object that will have a minimum probability swaps out.
Especially, among the described step D described RDF query requests is converted to the manageable SQL statement of relational database, specifically comprises:
Between user and relational database, set up query engine, encapsulation query language transformation rule in query engine, when the user submitted to described query engine with the RDF query requests, query engine was converted to the manageable SQL statement of relational database by described language conversion rule with the RDF query requests.
Especially, select any RDF query language among RQL, SquishQL, SPARQL and the DQL as user interface among the described step D, finish conversion between RDF query language and the sql like language by query engine.
Especially, the semantic related libraries of resource is Dynamic Establishing in user's query script in the described step F, and only the semantic relevancy of resource and querying condition is added resource semanteme related libraries greater than the resource of set threshold value.
The invention also discloses a kind of Internet resources semantic retrieval system based on resource description framework, comprise SPARQL query interface, SPARQL/SQL converter, Cache buffer memory, batch RDF/XML file introducting interface, relational database engine and relational database;
Described SPARQL query interface is used for submitting the SPARQL query requests to for the user, and the web resource obtained is returned to described user with the form of RDF file or XML file;
Described SPARQL/SQL converter is used for described SPARQL query requests is converted to SQL statement, realizes the conversion between SPARQL language and the sql like language, provides unified SPARQL query interface to the user;
Described Cache buffer memory is used for based on Bayesian decision theory, utilize characteristic attribute that principle of maximum entropy selects as attribute, and the RDF that will be queried record is as training dataset, make up optimizer, and described RDF record is calculated in the free time of inquiry by described optimizer, draw its invoked posterior probability, selecting the highest RDF object of described invoked posterior probability calls among the cache, when the cache capacity is expired, calculate that the RDF object will be queried the probability that hits among the cache, and the RDF object that will have a minimum probability swaps out;
Described batch RDF/XML file introducting interface is used for RDF data or the RDFS data of input RDF file or XML file layout;
Described RDF/XML document parser is used for obtaining RDF file or XML file from described batch RDF/XML file introducting interface, function according to each label in RDF file or the XML file, extract each tlv triple (triple) corresponding subject and predicate, guest, and deposit in the relational database by relational database engine;
Described relational database engine is used for storing the RDF data into relational database, and the interface that relational database is operated is provided.
The present invention utilizes resource description framework (Resource Description Framework, the correlation techniques such as the resource description that RDF) provides carry out data modeling to the heterogeneous resource on the Web, carry out describing based on the unified of RDF according to these models again, thereby the information retrieval of semantic-based when being supported in inquiry, so that loose physically resource is described effectively logically polymerization together by the unification based on RDF in the former native system, can more effectively utilize the web resource, and then reach the target of the resource sharing between the different platform.Introduce the semantic related libraries of resource and leave the resource that semantically has the degree of correlation in, can successfully process the dynamic change of the upper resource of Web, guaranteed recall ratio, precision ratio.And adopt based on the dynamic filtering algorithm of Bayes (Bayes) decision theory and guarantee that the object that often is queried puts into cache memory (Cache), with its first object as inquiry, effectively shortened the time of returning Query Result.
Description of drawings
The Internet resources semantic retrieving method process flow diagram based on resource description framework that Fig. 1 provides for the embodiment of the invention;
The Internet resources semantic retrieval system chart based on resource description framework that Fig. 2 provides for the embodiment of the invention;
The SPARQL/SQL commutator principle schematic diagram that Fig. 3 provides for the embodiment of the invention;
The Cache buffer memory principle figure signal that Fig. 4 provides for the embodiment of the invention.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the invention will be further described below in conjunction with drawings and Examples.
Please refer to shown in Figure 1, the Internet resources semantic retrieving method process flow diagram based on resource description framework that Fig. 1 provides for the embodiment of the invention.
Internet resources semantic retrieving method based on resource description framework in the present embodiment comprises the steps:
Step S101, the heterogeneous resource on the Web is carried out data modeling, and based on resource description framework (RDF) the Web resource is described, generate the RDF data.The detailed process of data modeling and resource description is as follows:
Step S1011, carry out data modeling with the original heterogeneous resource among the Web as data source, be used for setting up new type, and the attribute of type defined, if the described original heterogeneous resource in the system is with the form tissue of file, then take file as data source, if described original heterogeneous resource is with the form tissue of database, then with the data in the database as data source.But the method for data modeling is the organizational form that is independent of resource system, is without loss of generality.
The self-defining vocabulary of data modeling process among the vocabulary that step S1012, application RDF carry and the step S1011 is described the Web resource, generates the RDF file with extend markup language (XML) form tissue.The method so that in the former native system loose physically resource by this describing method effectively logically polymerization together can more effectively utilize the web resource, and then reach the target of the resource sharing between the different platform.
Step S102, with described RDF data with the form of the tlv triple record as relational database, finish the data storage to RDF.
Set up resource description framework pattern (RDF Schema, RDFS) with the corresponding relation of Entity-Relationship Model (E-R model), convert RDFS to the E-R model, according to this E-R model opening relationships database, the RDF data with the form of the tlv triple record as relational database, are finished the data storage to RDF.Analyze RDFS and relational database characteristics separately, set up corresponding relation between the two, from RDFS, according to its described class and attribute etc., convert entity, entity attribute and entity relationship etc. corresponding in the E-R model to.Again according to the rational relational database of this E-R modelling, the RDF data with the form of the tlv triple record as relational database, are finished the effective storage to the RDF data, the physical support of semantic-based layer is provided for the inquiry of RDF.
The foundation of relational database memory model, the main combination of considering storage space and search efficiency, simple such as the direct storage scheme of schema-oblivious tlv triple, implement simple, and search efficiency height very, but can store like this namespace prefix of repetition or have the resource of a plurality of attributes.Resource table, NameSpace table, literal table on the basis of schema-oblivious scheme, except triple table, have been introduced in the database in consideration time and space.Therefore, the subject and predicate of storing in the triple table, guest only are the index values in resource table or the literal table, and use for reference Jena2 thought, allow to have a plurality of triple tables in the database, different RDF files can be mapped in the different triple tables, can limit like this size of table.Wherein, described Jena2 is the API((Application Programming Interface of a java, application programming interface).Following table is the list structure several commonly used of RDF/RDFS storage.
Form List structure
Resource table Resource(ID,NSID,ResName)
Literal table Literal(ID,Val)
The NameSpace table NS(ID,Val)
Triple table Triples(ID,Subject,Predicate,Object,ResFlag)
In order to reduce the repeated storage of identical information, a plurality of forms have been introduced, but like this when carrying out inquiry, need related a plurality of forms, must improve the response time of inquiry, for this reason, use for reference space and the compromise scheme of time efficiency among the Jena2, when only having resource character string or Word message character string to surpass certain-length, just it is stored in resource table or the literal table in addition, and for relatively short Word message or resource information, it directly can be stored in the Triples table; But this scheme can produce in the Triples table field (such as: the Object field), what may store is an index value that arrives literal table, it also may be a character string, in the design list structure, certain field that how to solve a table both can be an integer index type, also can be a variable character string type, be the problem that next step research needs solution.
In order to improve search efficiency, for the attribute of frequent inquiry or the attribute in the dublin core (Dublin Core), can set up corresponding attribute list, like this when the Triples of some attributes commonly used of inquiry, need not related a plurality of table, can effectively improve the response time of inquiry.Attribute list has tentatively been introduced: DC Property and Common Property, to introduce for the attribute field that DC Property can be commonly used with DC, and concrete list structure is as shown in the table, and ID is index field in the table.
ID Subject Title Creator Publisher Type Description Date
Concrete structure for the table (Common Property table) of frequent inquiry is dynamic change, main according to nearest query history record, adopt certain performance prediction algorithm, with the most often the attribute of inquiry or future, the most possible attribute of inquiring about was dynamically constructed the structure that Common Property shows recently, in order to improve search efficiency, can be introduced in the buffer memory, this also will be the content that our buffer memory is studied, in order to improve the hit rate of buffer memory, need to upgrade it with certain cycle for Common Property table simultaneously.Except above-mentioned several forms, also need a key assignments maker table (Key Generation), be mainly used to generate the Major key of each form.
Step S103, utilize Bayes (Bayes) decision theory that the object in the cache memory (Cache) is dynamically screened.
Based on Bayesian decision theory, utilize characteristic attribute that principle of maximum entropy selects as attribute, and the RDF that will be queried record construct optimizer as training dataset; Described optimizer calculated described RDF record in the free time of inquiry, drew its invoked posterior probability, selected the highest RDF object of described invoked posterior probability and called among the cache; When the cache capacity is expired, calculate that the RDF object will be queried the probability that hits among the cache, and the RDF object that will have a minimum probability swaps out.
Step S104, user submit the RDF query requests to, and described RDF query requests is converted to the manageable SQL statement of relational database.
Between user and relational database, set up query engine, encapsulation query language transformation rule in query engine, when the user submitted to described query engine with the RDF query requests, query engine was converted to the manageable SQL statement of relational database by described language conversion rule with the RDF query requests.Wherein, select any RDF query language among RQL, SquishQL, SPARQL and the DQL as user interface, finish conversion between RDF query language and the sql like language by query engine.Described RQL is the abbreviation of RDF Query Language, is a kind of of RDF query language.Described SquishQL is the simplest RDF query language.Described SPARQL (Simple Protocol and RDF Query Language), a kind of query language and the data acquisition protocols for the RDF exploitation, it is defined by RDF data model that W3C develops, but can be used for any information resources that can represent with RDF.Described DQL is the abbreviation of Data Query Language SELECT, i.e. data query language.
Step S105, by described SQL statement Cache is inquired about, if inquire desired data, then described data are returned to the user, if do not inquire desired data, then direct and relational database engine communicates, and obtains described data from relational database, and it is returned to the user.
The semantic related libraries of the resource that step S106, foundation distribute is stored in the web resource that semantically has the degree of correlation; When receiving the RDF request that the user submits to, relational database engine is at first inquired about in the semantic related libraries of resource, if inquire desired data, then described data are returned to the user, if do not inquire desired data, then from relational database, obtain described data, and it is returned to the user.
The semantic related libraries of resource is Dynamic Establishing in user's query script, and only the semantic relevancy of resource and querying condition is added resource semanteme related libraries greater than the resource of set threshold value.Threshold value setting described in the present embodiment is 0.5.
By setting up the semantic related libraries of distributed resource, can well improve precision ratio, and shorten the time that feeds back to user's Query Result.Resource semantic related libraries is Dynamic Establishing in user's query script.For example: for the first time, when user input query condition " RDF ", suppose that search engine finds semantic related resource R1, R2, R3.This moment, network will be processed automatically, R2 and R3 was joined in the semantic related libraries of resource of R1, and detail record R2, and R3 and R1 are relevant aspect " RDF ", and the degree of correlation separately.When the second time, the user carried out the inquiry of " RDF ", when search engine is found semantic relevant resource R1, judge then whether the semantic related libraries of resource of R1 is empty; If non-NULL, search engine is at first inquired about by the degree of correlation in the semantic related libraries of the resource of R1 from big to small, have so very high precision ratio, and first these results are returned to the user, effectively shortened the query feedback time, search engine is inquired about beyond the semantic related libraries of the resource of R1 more subsequently, and the result of mating is progressively returned to the user, and the semantic related libraries of the resource of revising R1, thereby guaranteed recall ratio.
In the process of inquiry, when the semantic related libraries of the resource of processing resource, in order not affect the time of returning Query Result, can finish giving mobile agent to the renewal operation of the semantic related libraries of resource.Mobile agent has independence, reactivity, communicativeness and movability, these characteristics make mobile agent can conserve network bandwidth, overcome network delay, encapsulation procotol, support asynchronous autonomous execution and supporting platform independence.In addition, in order to prevent that the semantic related libraries of resource can be increasing in query script, cause increasing the inquiry return results time, only the semantic relevancy of resource and querying condition is joined in the semantic related libraries of resource greater than 0.5 resource, thereby effectively control the size of the semantic related libraries of resource of each resource, be convenient to management and query manipulation.This shows, resource semantic related libraries is Dynamic Establishing in user's query script, meets present Web application demand, for the Web resource retrieval of semantic-based provides good approach.
As shown in Figure 2, the Internet resources semantic retrieval system chart based on resource description framework that provides for the embodiment of the invention of Fig. 2.
Internet resources semantic retrieval system based on resource description framework in the present embodiment comprises SPARQL query interface, SPARQL/SQL converter, Cache buffer memory, batch RDF/XML file introducting interface, relational database engine and relational database.
Described SPARQL query interface is used for submitting the SPARQL query requests to for the user, and the web resource obtained is returned to described user with the form of RDF file or XML file.
Described SPARQL/SQL converter is used for described SPARQL query requests is converted to SQL statement, realizes the conversion between SPARQL language and the sql like language, provides unified SPARQL query interface to the user.
By analyzing the relative merits of more various RDF query languages, select the RDF query language SPARQL of the most suitable realization sql like language conversion as user's query language of the present embodiment, analyze relatively the similarities and differences of itself and SQL, proposition transformation rule between the two.
As shown in Figure 3, the SPARQL/SQL commutator principle schematic diagram that provides for the embodiment of the invention of Fig. 3.The SPARQL/SQL converter is finished respectively the SPARQL query statement by query language modular converter and Query Result modular converter and is arrived the conversion of SQL statement and the conversion that the SQL query result arrives the SPARQL return results.The conversion of query language mainly is to convert the SPARQL query statement to SQL statement according to corresponding relational algebra rule and relation of equivalence, output to relational database engine, thereby support the inquiry to relational database, rely on powerful query performance and the perfect systemic-function of relational database to obtain the RDF data are inquired about more efficiently.
In view of the complicacy of SPARQL language, with the linguistic function blocking, order is arranged, unify to transform pointedly: 1. basic operation, such as selection, projection etc.; 2.UNION attended operation; 3.OPTIONAL pattern query; 4.FILTER condition query.The result who returns from the relation data library inquiry can not directly return to the user, because the data in the relational database be do not have semantic, so need through the one query results conversion, Semantic according to the mapping relations data reconstruction between the defined data of RDF memory module, be the SPARQL return results with the SQL query results conversion, return tuple or RDF view or Boolean according to user's demand again.
Described Cache buffer memory is used for based on Bayesian decision theory, utilize characteristic attribute that principle of maximum entropy selects as attribute, and the RDF that will be queried record is as training dataset, make up optimizer, and described RDF record is calculated in the free time of inquiry by described optimizer, draw its invoked posterior probability, selecting the highest RDF object of described invoked posterior probability calls among the cache, when the cache capacity is expired, calculate that the RDF object will be queried the probability that hits among the cache, and the RDF object that will have a minimum probability swaps out.
Bayes (Bayes) decision theory is a kind of interests of calculating posterior probability based on prior probability, and its core is famous Bayesian formula.Bayesian decision theory is to be proposed in 18th century by Bayes (Bayes, T.R), and its mathematic(al) representation is:
P r ( A i | B ) = P r ( A i ) P r ( B | A i ) ∑ i = 1 n P r ( A i ) P r ( B | A i )
In above-mentioned formula, Pr (A i| B) be posterior probability, Pr (A i) be prior probability.It connects prior probability and the posterior probability of event, utilizes prior imformation and sample data information to determine the posterior probability of event.Can come predicted events under some condition that has occured with it, the probability that this event occurs.
As shown in Figure 4, use in the present embodiment the dynamic update algorithm of realizing storage object among the cache based on the algorithm of Bayes.When making up optimizer, utilize characteristic attribute that maximum entropy feature stripper will select by maximum entropy method as attribute, and the RDF that will be queried record make up optimizer as training dataset.The optimizer that builds calculates its invoked posterior probability in the free time of inquiry to all RDF records, selects the highest RDF object of the probability that wherein is called and calls in advance among the cache, and then improve the hit rate of cache.When if cache is full, calculate that the RDF object will be queried the probability that hits among the cache, and the RDF object that will have a minimum probability swaps out.
Use is as follows based on the basic step that Bayesian decision theory cache upgrades optimization method:
Step1, selection RDF document are as training dataset D Train, the characteristic parameter collection att that adopts principle of maximum entropy to obtain 1Att m
Step2, process training obtain prior probability and conditional probability.
Step3, it is queried the posterior probability of hitting to all RDF document calculations in relational database.
Step4, the highest k bar RDF of selection posterior probability ask record, think that its possibility that is queried is maximum, put into Cache.Wherein, k is positive integer.
Step5, when central processing unit (CPU) is idle, again move Step3 and Step4, the content of Cache is upgraded, keep inquiry that higher hit rate is arranged in cache.
The user is a discrete probability event to the inquiry of RDF document.Each time may be consistent with former Query Result to the Query Result of document, the central idea that we are optimized search efficiency is exactly the historical information of utilizing in the past, predict fast and effectively the object that frequency of utilization is higher, it is stored among the high speed Cache, improves simultaneously the hit rate of cache as far as possible.
Inquire about the characteristics of in time dynamic change for adaptation, periodically upgrade cache storage RDF object and be very important.Introduce Bayes (Bayes) decision theory for this reason, the object that prediction will lessly be used, and it is swapped out from Cache, to keep higher search efficiency.
Described batch RDF/XML file introducting interface is used for RDF data or the RDFS data of input RDF file or XML file layout.
Described RDF/XML document parser is used for obtaining RDF file or XML file from described batch RDF/XML file introducting interface, function according to each label in RDF file or the XML file, extract each tlv triple (triple) corresponding subject and predicate, guest, and deposit in the relational database by relational database engine.
Described relational database engine is used for storing the RDF data into relational database, and the interface that relational database is operated is provided.
The web resource retrieval of the semantic-based when the present invention not only is supported in inquiry, can process the dynamic change of the upper resource of Web, and can guarantee the object that often is queried is put into cache memory (Cache), guarantee recall ratio, precision ratio, effectively shortened the time of returning Query Result.
Above-mentioned only is preferred embodiment of the present invention and institute's application technology principle, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses, and the variation that can expect easily or replacement all should be encompassed in protection scope of the present invention.

Claims (8)

1. the Internet resources semantic retrieving method based on resource description framework is characterized in that, comprises the steps:
A, the heterogeneous resource on the Web is carried out data modeling, and based on resource description framework (RDF) the Web resource is described, generate the RDF data;
B, with described RDF data with the form of the tlv triple record as relational database, finish the data storage to RDF;
C, utilize Bayes (Bayes) decision theory that the object in the cache memory (Cache) is dynamically screened;
D, user submit the RDF query requests to, and described RDF query requests is converted to the manageable SQL statement of relational database;
E, by described SQL statement Cache is inquired about, if inquire desired data, then described data are returned to the user, if do not inquire desired data, then direct and relational database engine communicates, and obtains described data from relational database, and it is returned to the user;
The semantic related libraries of the resource that F, foundation distribute is stored in the web resource that semantically has the degree of correlation; When receiving the RDF request that the user submits to, relational database engine is at first inquired about in the semantic related libraries of resource, if inquire desired data, then described data are returned to the user, if do not inquire desired data, then from relational database, obtain described data, and it is returned to the user.
2. the Internet resources semantic retrieving method based on resource description framework according to claim 1 is characterized in that, described steps A specifically comprises:
A1, carry out data modeling with the original heterogeneous resource among the Web as data source, be used for setting up new type, and the attribute of type defined, if the described original heterogeneous resource in the system is with the form tissue of file, then take file as data source, if described original heterogeneous resource is with the form tissue of database, then with the data in the database as data source;
The self-defining vocabulary of data modeling process in the vocabulary that A2, application RDF carry and the steps A 1 is described the Web resource, generates the RDF file with extend markup language (XML) form tissue.
3. the Internet resources semantic retrieving method based on resource description framework according to claim 2 is characterized in that, described step B specifically comprises:
Set up resource description framework pattern (RDF Schema, RDFS) with the corresponding relation of Entity-Relationship Model (E-R model), convert RDFS to the E-R model, according to this E-R model opening relationships database, the RDF data with the form of the tlv triple record as relational database, are finished the data storage to RDF.
4. the Internet resources semantic retrieving method based on resource description framework according to claim 3 is characterized in that, described step C specifically comprises:
Based on Bayesian decision theory, utilize characteristic attribute that principle of maximum entropy selects as attribute, and the RDF that will be queried record construct optimizer as training dataset; Described optimizer calculated described RDF record in the free time of inquiry, drew its invoked posterior probability, selected the highest RDF object of described invoked posterior probability and called among the cache; When the cache capacity is expired, calculate that the RDF object will be queried the probability that hits among the cache, and the RDF object that will have a minimum probability swaps out.
5. the Internet resources semantic retrieving method based on resource description framework according to claim 4 is characterized in that, among the described step D described RDF query requests is converted to the manageable SQL statement of relational database, specifically comprises:
Between user and relational database, set up query engine, encapsulation query language transformation rule in query engine, when the user submitted to described query engine with the RDF query requests, query engine was converted to the manageable SQL statement of relational database by described language conversion rule with the RDF query requests.
6. the Internet resources semantic retrieving method based on resource description framework of stating according to claim 5, it is characterized in that, select any RDF query language among RQL, SquishQL, SPARQL and the DQL as user interface among the described step D, finish conversion between RDF query language and the sql like language by query engine.
7. according to claim 6 the Internet resources semantic retrieving method based on resource description framework, it is characterized in that, the semantic related libraries of resource is Dynamic Establishing in user's query script in the described step F, and only the semantic relevancy of resource and querying condition is added resource semanteme related libraries greater than the resource of set threshold value.
8. Internet resources semantic retrieval system based on resource description framework, it is characterized in that, comprise SPARQL query interface, SPARQL/SQL converter, Cache buffer memory, batch RDF/XML file introducting interface, relational database engine and relational database;
Described SPARQL query interface is used for submitting the SPARQL query requests to for the user, and the web resource obtained is returned to described user with the form of RDF file or XML file;
Described SPARQL/SQL converter is used for described SPARQL query requests is converted to SQL statement, realizes the conversion between SPARQL language and the sql like language, provides unified SPARQL query interface to the user;
Described Cache buffer memory is used for based on Bayesian decision theory, utilize characteristic attribute that principle of maximum entropy selects as attribute, and the RDF that will be queried record is as training dataset, make up optimizer, and described RDF record is calculated in the free time of inquiry by described optimizer, draw its invoked posterior probability, selecting the highest RDF object of described invoked posterior probability calls among the cache, when the cache capacity is expired, calculate that the RDF object will be queried the probability that hits among the cache, and the RDF object that will have a minimum probability swaps out;
Described batch RDF/XML file introducting interface is used for RDF data or the RDFS data of input RDF file or XML file layout;
Described RDF/XML document parser is used for obtaining RDF file or XML file from described batch RDF/XML file introducting interface, function according to each label in RDF file or the XML file, extract each tlv triple (triple) corresponding subject and predicate, guest, and deposit in the relational database by relational database engine;
Described relational database engine is used for storing the RDF data into relational database, and the interface that relational database is operated is provided.
CN2012104339379A 2012-11-01 2012-11-01 Network resource semantic retrieval method and system based on resource description framework Pending CN102999563A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104339379A CN102999563A (en) 2012-11-01 2012-11-01 Network resource semantic retrieval method and system based on resource description framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104339379A CN102999563A (en) 2012-11-01 2012-11-01 Network resource semantic retrieval method and system based on resource description framework

Publications (1)

Publication Number Publication Date
CN102999563A true CN102999563A (en) 2013-03-27

Family

ID=47928131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104339379A Pending CN102999563A (en) 2012-11-01 2012-11-01 Network resource semantic retrieval method and system based on resource description framework

Country Status (1)

Country Link
CN (1) CN102999563A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572970A (en) * 2014-12-31 2015-04-29 浙江大学 SPARQL inquire statement generating system based on ontology library content
CN104750709A (en) * 2013-12-26 2015-07-01 中国移动通信集团公司 Semantic retrieval method and semantic retrieval system
CN105706078A (en) * 2013-10-09 2016-06-22 谷歌公司 Automatic definition of entity collections
CN105723366A (en) * 2013-11-22 2016-06-29 爱克发医疗保健公司 Method for preparing a system for searching databases and system and method for executing queries to a connected data source
CN103823855B (en) * 2014-02-19 2017-01-18 天津大学 Chinese encyclopedic knowledge organization and integration method aiming at semantic network
CN107122486A (en) * 2017-05-09 2017-09-01 中国科学院计算机网络信息中心 A kind of polynary big data fusion method and system for supporting BLOB
CN107728931A (en) * 2016-08-12 2018-02-23 西门子公司 Method and apparatus for data storage
CN108345622A (en) * 2017-01-25 2018-07-31 西门子公司 Model retrieval method based on semantic model frame and device
CN108694206A (en) * 2017-04-11 2018-10-23 富士通株式会社 Information processing method and device
CN108959291A (en) * 2017-05-19 2018-12-07 腾讯科技(深圳)有限公司 Querying method and relevant apparatus
CN109416706A (en) * 2016-06-02 2019-03-01 康维达无线有限责任公司 Semantic reasoning service is realized in M2M/IOT service layer
CN109710775A (en) * 2018-12-29 2019-05-03 北京航天云路有限公司 A kind of knowledge mapping dynamic creation method based on more rules
CN110447025A (en) * 2016-09-29 2019-11-12 康维达无线有限责任公司 It is enabled in Internet of Things semantic mashed up
CN111125308A (en) * 2019-12-21 2020-05-08 深圳前海黑顿科技有限公司 Lightweight text fuzzy search method supporting semantic association
CN111460229A (en) * 2020-02-23 2020-07-28 华中科技大学 Method and system for optimizing JSON (Java Server object notation) analysis among single-user and multiple workloads
CN112637263A (en) * 2020-11-23 2021-04-09 国网电力科学研究院有限公司 Multi-data center resource optimization promotion method and system and storage medium
CN114020779A (en) * 2021-10-22 2022-02-08 上海卓辰信息科技有限公司 Self-adaptive optimization retrieval performance database and data query method
CN114297224A (en) * 2021-12-22 2022-04-08 重庆邮电大学 RDF-based heterogeneous data integration and query system and method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080133553A1 (en) * 2006-12-04 2008-06-05 Microsoft Corporation Building, viewing, and manipulating schema sets
CN101216851A (en) * 2008-01-11 2008-07-09 孟小峰 Ontology data administrative system and method
CN101482875A (en) * 2008-12-24 2009-07-15 中国移动通信集团北京有限公司 Information query method and apparatus
CN101593180A (en) * 2008-05-30 2009-12-02 国际商业机器公司 The SPARQL inquiry is changed into the method and apparatus of SQL query
CN101853257A (en) * 2009-03-31 2010-10-06 国际商业机器公司 System and method for transformation of SPARQL query
CN101901247A (en) * 2010-03-29 2010-12-01 北京师范大学 Vertical engine searching method and system for domain body restraint
CN102693310A (en) * 2012-05-28 2012-09-26 无锡成电科大科技发展有限公司 Resource description framework querying method and system based on relational database
CN102722542A (en) * 2012-05-23 2012-10-10 无锡成电科大科技发展有限公司 Resource description framework (RDF) graph pattern matching method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080133553A1 (en) * 2006-12-04 2008-06-05 Microsoft Corporation Building, viewing, and manipulating schema sets
CN101216851A (en) * 2008-01-11 2008-07-09 孟小峰 Ontology data administrative system and method
CN101593180A (en) * 2008-05-30 2009-12-02 国际商业机器公司 The SPARQL inquiry is changed into the method and apparatus of SQL query
CN101482875A (en) * 2008-12-24 2009-07-15 中国移动通信集团北京有限公司 Information query method and apparatus
CN101853257A (en) * 2009-03-31 2010-10-06 国际商业机器公司 System and method for transformation of SPARQL query
CN101901247A (en) * 2010-03-29 2010-12-01 北京师范大学 Vertical engine searching method and system for domain body restraint
CN102722542A (en) * 2012-05-23 2012-10-10 无锡成电科大科技发展有限公司 Resource description framework (RDF) graph pattern matching method
CN102693310A (en) * 2012-05-28 2012-09-26 无锡成电科大科技发展有限公司 Resource description framework querying method and system based on relational database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨琴: "基于关系数据库的RDF存储与查询的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105706078A (en) * 2013-10-09 2016-06-22 谷歌公司 Automatic definition of entity collections
CN105706078B (en) * 2013-10-09 2021-08-03 谷歌有限责任公司 Automatic definition of entity collections
CN105723366B (en) * 2013-11-22 2020-05-19 爱克发医疗保健公司 Method for preparing a system for searching a database and system and method for executing a query to a connected data source
CN105723366A (en) * 2013-11-22 2016-06-29 爱克发医疗保健公司 Method for preparing a system for searching databases and system and method for executing queries to a connected data source
CN104750709A (en) * 2013-12-26 2015-07-01 中国移动通信集团公司 Semantic retrieval method and semantic retrieval system
CN103823855B (en) * 2014-02-19 2017-01-18 天津大学 Chinese encyclopedic knowledge organization and integration method aiming at semantic network
CN104572970B (en) * 2014-12-31 2017-09-12 浙江大学 A kind of SPARQL query statements generation system based on ontology library content
CN104572970A (en) * 2014-12-31 2015-04-29 浙江大学 SPARQL inquire statement generating system based on ontology library content
CN109416706A (en) * 2016-06-02 2019-03-01 康维达无线有限责任公司 Semantic reasoning service is realized in M2M/IOT service layer
CN107728931A (en) * 2016-08-12 2018-02-23 西门子公司 Method and apparatus for data storage
US10740289B2 (en) 2016-08-12 2020-08-11 Siemens Aktiengesellschaft Method and apparatus for storing data
CN110447025A (en) * 2016-09-29 2019-11-12 康维达无线有限责任公司 It is enabled in Internet of Things semantic mashed up
CN108345622A (en) * 2017-01-25 2018-07-31 西门子公司 Model retrieval method based on semantic model frame and device
CN108694206A (en) * 2017-04-11 2018-10-23 富士通株式会社 Information processing method and device
CN107122486B (en) * 2017-05-09 2020-08-14 中国科学院计算机网络信息中心 Multi-element big data fusion method and system supporting BLOB
CN107122486A (en) * 2017-05-09 2017-09-01 中国科学院计算机网络信息中心 A kind of polynary big data fusion method and system for supporting BLOB
CN108959291A (en) * 2017-05-19 2018-12-07 腾讯科技(深圳)有限公司 Querying method and relevant apparatus
CN108959291B (en) * 2017-05-19 2023-03-24 腾讯科技(深圳)有限公司 Query method and related device
CN109710775A (en) * 2018-12-29 2019-05-03 北京航天云路有限公司 A kind of knowledge mapping dynamic creation method based on more rules
CN111125308A (en) * 2019-12-21 2020-05-08 深圳前海黑顿科技有限公司 Lightweight text fuzzy search method supporting semantic association
CN111460229A (en) * 2020-02-23 2020-07-28 华中科技大学 Method and system for optimizing JSON (Java Server object notation) analysis among single-user and multiple workloads
CN112637263A (en) * 2020-11-23 2021-04-09 国网电力科学研究院有限公司 Multi-data center resource optimization promotion method and system and storage medium
CN112637263B (en) * 2020-11-23 2022-11-11 国网电力科学研究院有限公司 Multi-data center resource optimization promotion method and system and storage medium
CN114020779A (en) * 2021-10-22 2022-02-08 上海卓辰信息科技有限公司 Self-adaptive optimization retrieval performance database and data query method
CN114020779B (en) * 2021-10-22 2022-07-22 上海卓辰信息科技有限公司 Self-adaptive optimization retrieval performance database and data query method
CN114297224A (en) * 2021-12-22 2022-04-08 重庆邮电大学 RDF-based heterogeneous data integration and query system and method

Similar Documents

Publication Publication Date Title
CN102999563A (en) Network resource semantic retrieval method and system based on resource description framework
CN103064875B (en) A kind of spatial service data distributed enquiring method
CN104160394B (en) Scalable analysis platform for semi-structured data
KR100815563B1 (en) System and method for knowledge extension and inference service based on DBMS
CN100442292C (en) Method for indexing and acquiring semantic net information
CN100481076C (en) Searching method for relational data base and full text searching combination
EP3446242A1 (en) Query plan generation and execution in a relational database management system with a temporal-relational database
CN109299133A (en) Data query method, computer system and non-transitory computer-readable medium
CN104699719B (en) A kind of semantization method of internet-of-things terminal equipment
Novikov et al. Querying big data
CN103177094A (en) Cleaning method of data of internet of things
Chakraborty et al. Semantic etl—State-of-the-art and open research challenges
CN114297224A (en) RDF-based heterogeneous data integration and query system and method
Chen et al. A semantic based information retrieval model for blog
Qiu et al. Web service discovery based on semantic matchmaking with UDDI
Cherniak et al. Profile driven data management
Harth Link traversal and reasoning in dynamic linked data knowledge bases
Campi et al. Chapter 9: service marts
Xu A Temporal RDF (S) Construction Method Based on Temporal Relational Database
Ren Distributed RDF stream processing and reasoning
Telang et al. Information integration across heterogeneous sources: Where do we stand and how to proceed?
Shah et al. Improving query performance using materialized XML views: A learning-based approach
Telang et al. Information Integration across Heterogeneous Domains: Current Scenario, Challenges and the InfoMosaic Approach
Velegrakis et al. On Z39. 50 wrapping and description logics
Liu et al. MUSYOP: towards a query optimization for heterogeneous distributed database system in energy data management

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130327