CN103646032B - A kind of based on body with the data base query method of limited natural language processing - Google Patents

A kind of based on body with the data base query method of limited natural language processing Download PDF

Info

Publication number
CN103646032B
CN103646032B CN201310556508.5A CN201310556508A CN103646032B CN 103646032 B CN103646032 B CN 103646032B CN 201310556508 A CN201310556508 A CN 201310556508A CN 103646032 B CN103646032 B CN 103646032B
Authority
CN
China
Prior art keywords
data base
node
key
natural language
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310556508.5A
Other languages
Chinese (zh)
Other versions
CN103646032A (en
Inventor
漆桂林
崔荣国
张慧
邓波
陆彬
杨成彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Ke Data Technology Co., Ltd.
Original Assignee
漆桂林
崔荣国
张慧
邓波
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 漆桂林, 崔荣国, 张慧, 邓波 filed Critical 漆桂林
Priority to CN201310556508.5A priority Critical patent/CN103646032B/en
Publication of CN103646032A publication Critical patent/CN103646032A/en
Application granted granted Critical
Publication of CN103646032B publication Critical patent/CN103646032B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of based on body with the data base query method of limited natural language processing, mainly for the treatment of the natural language querying problem of data base.The relation schema of data base is converted to the body of correspondence by the present invention, participle special dictionary and keyword index is built according to the record in data base, after extracting key word from user's natural language querying, utilizing the semantic information in body to find the association of multiple key word, natural language querying is converted to SQL query the most at last.

Description

A kind of based on body with the data base query method of limited natural language processing
Technical field
The invention belongs to semantic query field, relate to a kind of based on body with the data base querying of limited natural language processing Method.
Background technology
Along with the development of the Internet, the quantity of information comprised in the Internet is also increasing in geometry multiple.And according to Statistics, the data in the Internet have 70% to be above leaving in data base, so effectively looking into data base Ask and have very important status in web data analysis in the present age.But traditional inquiry to data base needs specially What industry personnel were deep understands the pattern information of data store internal, and constructs suitable SQL query statement;To non-specially For industry personnel, owing to not possessing the Professional knowledge of data base, can only be deeply aware of one's own helplessness when faced with a great task for huge data base.
Data base's natural language querying is the product that natural language understanding is combined with database technology, in recent years, as people The intelligence interface technology of natural language understanding in work intelligence and come into one's own, especially with handwritten form and the speech recognition of Chinese Binding, there is the highest theory value and be widely applied prospect.Natural language interface to database allows user By natural language, data base is carried out various operation, system is converted into the operating language of data base automatically, Use for user brings great convenience.Here the problem of a very challenging property is, the most accurately by nature language Speech search statement is converted into the operating language of data base.Owing to lacking corpus, traditional based on part-of-speech tagging and grammer The effect analyzing the process of the natural language querying to data base is extremely difficult to the requirement of practicality.
Making a general survey of multiple systems that recent year is developed, the technology used is mainly based upon the E-R Chinese of data base Understand model, class relational algebra logical formula intermediate language conversion, centered by condition sentence pattern coupling and multilingual Gang form etc..The CQI system of Southeast China University be the nineties in last century domestic succeed in developing first based on E-R Data base's Chinese Query interface of model, but interface is limited by the imperative sentence type that can only use, and vocabulary quantity is limited. The Chiql that the Renmin University of China and Hong Kong Chinese University are succeeded in developing is at query statement structure, vocabulary quantity and nature Language inquiry conversion aspect has had the biggest improvement, but when user uses, limited degree is the biggest, the most still Do not reach the requirement of practicality.
Summary of the invention
Technical problem: the present invention provides a kind of user friendly, and the natural language inputted by user just can be automatic Generate SQL query statement, and Query Result is returned to user based on body and the data base of natural language processing Querying method.
Technical scheme: the present invention based on body and the data base query method of natural language processing, first from data base Relation schema in extract body, a concrete class will be abstracted into by relation table, have between class and class succession, The relations such as equivalence;The data type attribute that every string in relation table is conceptualized as in body;Secondly, coefficient is closed The object properties being also translated in body according to the external key in storehouse.Then, ontology translation is become graph data structure, then ties The limited natural language that user is inputted by conjunction natural language processing technique carries out participle, builds keyword index, and search is even Connecing the connected graph of multiple key word, SQL conversion etc. realizes natural language to SQL (a kind of data base querying and program Design language) conversion of language.
The present invention based on body and the data base query method of natural language processing, comprise the steps:
1) ontology translation gone out according to database relation mode construction is become graph data structure: the class in body is changed For class node, data type attribute is converted into attribute node, and attribute node has one to be connected to the class that it is specified respectively The limit of node, object properties are converted into the limit connecting two classes;
2) build participle special dictionary and keyword index: each the record being successively read in data base, will read To record value add to dictionary is inquired about as user time participle special dictionary, when reading out each record simultaneously This record value as value, is formed key assignments as key, the relation table name corresponding in data base using this record value and row name Right, it is deposited in non-relational database, as keyword index, for the key word that quickly location is given, improves Search efficiency;
3), after system receives user's Limited Natural Language Query, utilize step 2) in the special dictionary that constructs will be subject to Limit natural language is decomposed into multiple significant key word;
4) using step 3) in the key word that decomposites one by one as key, keyword index is searched corresponding value, I.e. find out the relation table name corresponding to this key word and row name, then in step 1) in generate graph data structure in look for To the node that all relation table names are corresponding with row name, finally by connected component corresponding for all nodes from graph data structure Extract, as search space;
5) traversal step 4) in connected component in the search space that constructs, finding can be by relevant for institute in search space All connected subgraphs that keyword couples together, if any one can not be found to meet the connected subgraph of condition, then find out bag Containing the connected subgraph of key word as much as possible, then by the connected subgraph found out according to its key word number comprised from greatly It is ranked up to little, the connected subgraph identical for comprising key word number, then further according to the limit number comprised from small to large Being ranked up, k the connected subgraph that last selected and sorted is the most forward, the value of k is according to the size of data base and search The all connected subgraph numbers obtained determine;
6) by step 5) in select k connected graph according to sequence, be converted into SQL language according to following rule successively Sentence: the Select clause in SQL statement is filled with *, in order to represent, all of row is all returned, by connected graph In class node be written in the From clause in SQL statement, the limit connecting two class nodes is converted to external key close In the Where clause that system is written in SQL statement, keyword root user inputted is according to the relation table name of its correspondence In the Where words and expressions being written in SQL statement with row name;
After SQL statement generates, data base is inquired about, then Query Result is returned to user.
In one preferred version of the present invention, step 2) in non-relational database use MongoDB data base.
Beneficial effect: the present invention compared with prior art, has the advantage that
The inventive method passes through the semantic information in body storage data base between relation table, quick by keyword index Construct search space, find in search volume one or more comprise all user's searching keywords minimum even Logical figure, and according to certain rule, minimum connected graph is converted to SQL query statement, effective real on this basis Existing semantic query.
In this method, the Limited Natural Language Query of user is resolved into multiple significant key word, between key word Potential association be the graph data structure corresponding by search and excavate, be therefore not related to part-of-speech tagging and grammer Analyze, be also not limited to sentence pattern.Such as inquiry " student number of Zhang San " and " Zhang San of student number ", they are all decomposed by system For key word " Zhang San ", " student number ", the result that therefore two kinds of inquiries obtain is duplicate.
It addition, this method is different from tradition method based on data base's E-R model, a kind of method proposing novelty, will The body extracted from database relation pattern, is converted into graph data structure, by traveling through this graph data structure, looks for The connected subgraph that in inquiring about to user, the key word that comprises is corresponding, it is achieved the natural language inquiry to data base.For one Individual huge data base, it can comprise ten hundreds of records, but its relation schema is relatively simple.And And for data, relation schema is the least, and each record in data base must have the most right The relation schema answered, it is possible to relation schema is regarded as abstract unified standard out, data base from data In all of record all meet this standard.This method extracts body from the relation schema of data base, is turned by this body Change graph data structure into, by keyword index, from graph data structure, extract all connections corresponding with key word divide Amount, as search space, reduces the scope of search greatly, thus is greatly improved search efficiency.
The present invention uses non-relational database to store keyword index, flexible structure when non-relational database stores, Not by relevant database ACID affairs (atomicity Atomicity, concordance Consistency, isolation Isolation, Persistency Durability) constraint, during storage keyword index, comparing traditional Relational DataBase has the strongest excellent Gesture.
Proving through instance analysis, utilizing data base's natural language querying method that the present invention proposes, user need not ten Divide and understand data base, it is not required that SQL query language is had any basis, it is only necessary to input-bound natural language Can be achieved with the inquiry to data base.Automatic transformation process from natural language to SQL is fully transparent to user. The present invention is a kind of method of user friendly, and by checking, the method is the most feasible.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the basic process of the present invention;
Fig. 2 is that search space of the present invention builds flow chart;
Fig. 3 is that the present invention inquires about subgraph search algorithm flow chart;
Fig. 4 is the PCDO graph data structure schematic diagram of example;
Fig. 5 is the search space schematic diagram of example;
Fig. 6 is the PCDO subgraph schematic diagram of example.
Detailed description of the invention
Below in conjunction with embodiment and Figure of description, describe the implementation process of the present invention in detail.
The relational database query method based on body and limited natural language processing of the present invention, including following 6 steps Rapid:
1) body turns PCDO graph data structure: body (Ontology) is to be proposed by World Wide Web Consortium (W3C) For describing a kind of specification of all kinds of resource informations on WWW, the body in the present invention is according to relational database Pattern information, builds according to certain rule, all kinds of resource informations in descriptive data base, ontological construction Rule is as follows:
A () body builds class (class): for all relation tables in relational database, structure the most respectively Build out a corresponding class, class and relation table one_to_one corresponding;
B () body builds data type attribute (dataTypeProperty): every in each relation table t String c, constructs a data type attribute corresponding for c, this class belonging to data type attribute the most respectively It is the class corresponding to relation table t;
C () body builds object properties (objectProperty): for each association two in relational database Open the external key f of relation table t1, t2, between two classes that relation table t1, t2 are corresponding, construct a f corresponding Object properties, two classes that these object properties connect are class corresponding to relation table t1, t2.
Body only comprises class, object properties and data type attribute.In order to make full use of the information in body, we Propose PCDO graph data structure (calling PCDO figure in the following text).PCDO figure mainly comprises two seed data structures and i.e. ties Point (Node) and limit (Edge).
The data structure of node is as shown in table 1:
The data structure of table 1 node
The data structure of node comprises Type, Name, Edges, Keyword, Value, KeywordType six Individual attribute.Type attribute is for identifying the type of node, and node types includes class node (C_Node) and attribute knot Point (P_Node);Name attribute is for identifying the title of node;Edges attribute is adjacent with current node for record All of limit;Keyword attribute, Value attribute, KeywordType attribute are in step 1) transformation process in All it is set to sky, step 4) the middle initialization procedure describing these three attribute in detail.
The data structure on limit is as shown in table 2:
The data structure on table 2 limit
Limit data structure comprises tetra-attributes of Type, Name, Node1, Node2.Type attribute is used for identifying The type on limit, the type on limit includes data type attribute limit (D_Edge) and object properties limit (O_Edge);Name Attribute is for identifying the title on limit;Two nodes that two attributes of Node1, Node2 connect when front for record. Because PCDO figure is non-directed graph, so the property value of two attributes of Node1 and Node2 is tradable.
Body is to the switch process of PCDO figure:
(1) conversion of class node (C_Node): in body, all of class is respectively converted into a node, the Type of node Attribute is set to " C_Node ", and the Name attribute of node is set to the title of corresponding class, the Edges attribute of node It is set to sky;
(2) conversion of attribute node (P_Node): in body, all of data type attribute is converted into a knot respectively Point, the Type attribute of node is set to " P_Node ", and the Name attribute of node is set to the data type of correspondence and belongs to The title of property, the Edges attribute of node is set to sky;
(3) conversion of data type attribute limit (D_Edge): tie in the class belonging to attribute node P and this attribute node Adding a limit between some C, the Type attribute on limit is set to " D_Edge ", Node1, Node2 attribute on limit Being respectively set to P, C, the Name attribute on data type attribute limit is set to " hasProperty ", respectively at node The Edges attribute of P and node C adds the data type attribute limit being currently converted to;
(4) conversion of object properties limit (O_Edge): all object properties in body are converted into two that it connects respectively An individual limit between class node C1, C2, the Type attribute on limit is set to " O_Edge ", Node1, the Node2 on limit Attribute is respectively set to C1, C2, and the Name attribute on limit is set to the title of the object properties of correspondence, respectively at knot The Edges attribute of some C1, C2 adds the object properties limit being currently converted to.
2) participle special dictionary and keyword index are built:
All relation tables in ergodic data storehouse, each the record being successively read in relation table, record value is written to In dictionary, simultaneously using this record value as key, the relation table name corresponding in data base using this record value and row name as Value, forms key-value pair, as keyword index.In a preferred embodiment of the present invention, non-relational database is adopted Using MongoDB data base, key-value pair is deposited in MongoDB data base, as keyword index.Certainly, The inventive method is not limited to use MongoDB data base, and all non-relationals (NoSQL) data base all can be This uses.
Structure and the example of key-value pair are as shown in table 3 below." student " table such as there is a record under " name " this string For " Zhang San ", when reading " Zhang San " this record, " Zhang San " is set as key (key), TableName attribute Being set to the relation table name " student " that " Zhang San " is corresponding, ColumnName attribute is set to the row name of row corresponding to " Zhang San " " name ", forms key-value pair, is stored in keyword index.Dictionary creation is saved in magnetic after completing in the form of a file In dish, when using this dictionary, then go to relevant position in disk to read every time.Build participle special dictionary be in order to from The natural language querying of family input decomposites key word.
Table 3 key-value pair structure and example
During traveling through each relation table, simultaneously using the table name of relation table and all row names as key, structure Go out the key-value pair shown in table 3, be deposited in keyword index: for the table name of relation table, by TableName attribute It is set to " table ", ColumnName attribute is set to the table name of mapping table;For the row name of relation table, TableName attribute is set to " column ", ColumnName attribute is set to the row name of respective column.
It addition, key word may multiple elements in correspondence database, the most above-mentioned " Zhang San " be likely to corresponding another Open another row of table, it is also desirable to be deposited in keyword index by this key-value pair.As user's inquiry " Zhang San ", logical Cross searching keyword index, all table names corresponding to " Zhang San " this word and row name can be rapidly be.
3) utilize special dictionary to carry out participle: after system receives user's natural language querying, to utilize step 2) in structure Natural language is decomposed into multiple significant key word by the special dictionary built out;
4) combine Fig. 2 and the structure of search space be described: to step 3) in each key word of decompositing, by inquiry Keyword index, can obtain this word all of relation table name corresponding in data base and row name.According to step 1), A class in one relation table name correspondence body, a class node in such corresponding PCDO figure;In relation table A data type attribute in one row name correspondence body, in this data type attribute also corresponding PCDO figure Attribute node.When the corresponding multiple relation table names of key word and row name, the most just correspond in PCDO figure is multiple Node.Below the node in key word correspondence PCDO figure is referred to as the mapping node of this key word.
The Keyword attribute mapping node is set to corresponding key word.If having multiple mapping to tie for a key word Point, makes a distinction with Value attribute, and Value takes different numberings.KeywordType attribute include " table ", " column ", " value " three value, determines according to keyword index.If the TableName in keyword index Value is " table ", then KeywordType attribute is set to " table ";If the TableName value in keyword index is " column ", then KeywordType attribute is set to " column ";In the case of other, KeywordType attribute sets It is set to " value ".
By the connected component belonging to all mapping nodes corresponding for all key words from step 1) build PCDO figure In extract, as search space.Because search space one is set to a subset of PCDO figure, and by key After glossarial index finds the connected component belonging to all mapping nodes, can effectively reduce hunting zone.
5) combine Fig. 3 and connected subgraph searching method be described: according to step 4) in the search space that constructs, search bag Containing the connected subgraph of all key words, key step is as follows:
A) randomly choose in search space connected component not processed, find the institute in this connected component There is mapping node, put in set X.
If b) there is n (n >=2) individual mapping node Node in set X1、Node2。。。NodenKeyword belong to Property identical, then according to the difference of Value attribute, set X is extended to n set X1、X2…Xn, then by X In except Node1、Node2。。。NodenOutside other all mapping nodes add X to1、X2…XnIn, and delete Set X.
C) to X1、X2…XnRepeat step b), until each during each is gathered maps node Keyword attribute is the most different, finally gives m the set that can not extend again;
D) arbitrarily select a set in m set to be designated as W, arbitrarily select one to map node as initial knot Point does BFS (BFS);
E) during BFS traversal, if running into new mapping node, then by the path of this node to initial node Record in set Set, and the mapping node newly run into is deleted from W;
F) repeat e), until all mapping nodes have all traveled through in set W, now have recorded in set Set By mapping Node connectedness all in W to node together and limit, i.e. search all mapping in an association W and tied The path of point, and this path is step 1) in the subset of PCDO figure that constructs, below by such connection Path is referred to as PCDO subgraph;
G) repeat d) until all of process of aggregation is complete;
H) repeat a), until all of subquery spatial manipulation is complete;
I) all PCDO subgraphs obtained are ranked up, arrange from big to small by the number of its key word comprised Sequence, the PCDO subgraph identical for comprising key word number, then it is ranked up from small to large further according to the limit number comprised, K the PCDO subgraph that last selected and sorted is the most forward, k needs to determine one suitably according to the size of concrete database Value, or specified by user, the most only represent a suitable number;
6) generate SQL statement: according to step 5) in obtained k PCDO subgraph, respectively by PCDO subgraph Being converted into SQL statement, SQL statement form is as follows:
Select<inquiry content>
From<tables of data>
Where<querying condition>
PCDO subgraph is to the transformational rule of SQL:
A) select clause inserts " * ", represent and all row meeting querying condition in data base are all returned to User;
B) from clause is according to all class nodes in PCDO subgraph, inserts the relation table of correspondence;
C) where clause (does not has object according to the object properties limit of PCDO subgraph, the foreign key relationship inserting correspondence Attribute limit is not filled out);
D) where clause is according to the attribute node in PCDO subgraph, inserts the value that attribute node is corresponding.
After above-mentioned SQL statement generates, being inquired about data base by database query interface, result is returned the most at last Back to user.
Below in conjunction with one simplify application example, describe in detail the present invention implementation process:
1) body turns PCDO figure: the relational database involved by this example is a record students' needs relevant information Data base, data comprise following information:
Relation table: student, curricula-variable, course, institute
Row name: student _ name, student _ student number, curricula-variable _ course code name, curricula-variable _ student number, course _ department's code name, Course _ course code name, institute _ department's code name, institute _ department's remarks
External key: curricula-variable _ student number=student _ student number, curricula-variable _ course code name=course _ course code name, Course _ department's code name=institute _ department's code name
Therefore the body extracted from the pattern information of this relational database is one and describes students' needs relevant information Body.The details comprised in body are as follows:
Class: student, curricula-variable, course, institute;
Data type attribute: student _ name, student _ student number, curricula-variable _ course code name, curricula-variable _ student number, course _ institute It is code name, course _ course code name, institute _ department's code name, institute _ department's remarks;
Object properties: student _ student number _ curricula-variable _ student number, curricula-variable _ course code name _ course _ course code name, course _ department Code name _ institute _ department's code name.
The body being converted to from relation schema is as follows:
<?Xml version=" 1.0 "?>
<rdf:RDF
Xmlns:rdf=" http://www.w3.org/1999/02/22-rdf-syntax-ns# "
Xmlns:xsd=" http://www.w3.org/2001/XMLSchema# "
Xmlns:rdfs=" http://www.w3.org/2000/01/rdf-schema# "
Xmlns:owl=" http://www.w3.org/2002/07/owl# "
Xmlns=" http://www.project.com/d2o_owl# "
Xml:base=" http://www.project.com/d2o_owl " >
<owl:Ontology rdf:about=" "/>
<owl:ObjectProperty rdf:ID=" curricula-variable _ student number _ student _ student number ">
<rdfs:range rdf:resource=" # student "/>
<rdfs:domain rdf:resource=" # curricula-variable "/>
</owl:ObjectProperty>
<owl:DatatypeProperty rdf:ID=" curricula-variable _ student number ">
<rdfs:range rdf:resource=" http://www.project.com/d2o_owl# character string type "/>
<rdfs:domain rdf:resource=" # curricula-variable "/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID=" course _ course code name ">
<rdfs:range rdf:resource=" http://www.project.com/d2o_owl# character string type "/>
<rdfs:domain rdf:resource=" # course "/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID=" curricula-variable _ course code name ">
<rdfs:range rdf:resource=" http://www.project.com/d2o_owl# character string type "/>
<rdfs:domain rdf:resource=" # curricula-variable "/>
</owl:DatatypeProperty>
<owl:Class rdf:ID=" curricula-variable "/>
<owl:DatatypeProperty rdf:ID=" student _ student number ">
<rdfs:range rdf:resource=" http://www.project.com/d2o_owl# character string type "/>
<rdfs:domain rdf:resource=" # student "/>
</owl:DatatypeProperty>
<owl:Class rdf:ID=" course "/>
<owl:DatatypeProperty rdf:ID=" course _ department's code name ">
<rdfs:range rdf:resource=" http://www.project.com/d2o_owl# character string type "/>
<rdfs:domain rdf:resource=" # course "/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID=" student _ name ">
<rdfs:range rdf:resource=" http://www.project.com/d2o_owl# character string type "/>
<rdfs:domain rdf:resource=" # student "/>
</owl:DatatypeProperty>
<owl:Class rdf:ID=" student ">
</owl:Class>
<owl:ObjectProperty rdf:ID=" curricula-variable _ course code name _ course _ course code name ">
<rdfs:range rdf:resource=" # course "/>
<rdfs:domain rdf:resource=" # curricula-variable "/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID=" course _ department's code name _ institute _ department's code name ">
<rdfs:range rdf:resource=" # institute "/>
<rdfs:domain rdf:resource=" # course "/>
</owl:ObjectProperty>
<owl:DatatypeProperty rdf:ID=" institute _ department's code name ">
<rdfs:range rdf:resource=" http://www.project.com/d2o_owl# character string type "/>
<rdfs:domain rdf:resource=" # institute "/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID=" institute _ department's remarks ">
<rdfs:range rdf:resource=" http://www.project.com/d2o_owl# character string type "/>
<rdfs:domain rdf:resource=" # institute "/>
</owl:DatatypeProperty>
<owl:Class rdf:ID=" institute "/>
<rdf:RDF>
By step 1) in transformational rule body is changed into PCDO graph data structure, as shown in Figure 4.
Class in body is converted to the class node in PCDO figure, i.e. oval node in figure;Data type in body belongs to Property is converted to the attribute node in PCDO figure, i.e. rectangle node in figure;The class node that attribute node is corresponding with it is used Article one, hasProperty limit is connected;Object properties in body are converted to connect the limit of two class nodes.
2) building participle special dictionary and keyword index, keyword index partial content is as follows:
Wherein, in the key-value pair that key " student " is corresponding, TableName attribute is " table ", ColumnName attribute For " student ", what expression " student " was corresponding herein is a relation table, and relation table entitled " student ";
In the key-value pair that key " student number " is corresponding, TableName field is " column ", and ColumnName field is " to learn Raw ", what i.e. expression " student number " was corresponding herein is a row name, and arranges entitled " student number ";
The ColumnName field that key " 09131011 " is corresponding is " student number ", and TableName field is " student ", i.e. Represent the key " 09131011 " occurrence in corresponding " student " table herein under " student number " row.
3) participle: such as, user input query statement " searches the course selected by student that student number is 09131011 Belonging to institute ", after participle, obtain five significant key words: student number, 09131011, student, course, Institute.
4) build search space: utilize 2) in the keyword index that obtains, key word can be obtained and scheme with PCDO The mapping relations of middle node are as follows:
Table 4 key word maps PCDO figure node
According to key word and the mapping relations of PCDO figure node, construct search space (wherein overstriking as shown in Figure 5 Node be the node that key word is mapped to).
5) connected subgraph search: according to searching algorithm, find in PCDO figure and all key words can be connected together One or more connected subgraphs, the result that obtains of search is as shown in Figure 6.
6) according to the transformational rule of PCDO subgraph to SQL, generation SQL statement:
Select clause fills out " * ", i.e. obtains select clause: select*;
From clause inserts the relation table name that class node is corresponding, i.e. obtains from clause:
From student, curricula-variable, course, institute;
Where clause is converted to the external key of correspondence according to object properties limit, three object properties limits in Fig. 6: student _ Student number _ curricula-variable _ student number, curricula-variable _ course code name _ course _ course code name, course _ department's code name _ institute _ department's code name, Change respectively, i.e. obtain where clause: where student _ student number=curricula-variable _ student number and curricula-variable _ course code name=class Journey _ course code name and course _ department's code name=institute _ department's code name
Last processing attribute node: the mapping node of " 09131011 " is " student _ student number ", obtains after conversion Student _ student number=" 09131011 ", adds in where clause.
The SQL statement ultimately generated is:
select*
From student, curricula-variable, course, institute
Where student _ student number=curricula-variable _ student number and curricula-variable _ course code name=course _ course code name and course _ department's generation Number=institute _ department code name and student _ student number=" 09131011 "
After SQL statement generates, by database query interface, data base is inquired about, finally return result to use Family.

Claims (2)

1. one kind based on body and the data base query method of limited natural language processing, it is characterised in that the method comprises the steps:
1) ontology translation gone out according to database relation mode construction is become graph data structure: the class in body is converted into class node, data type attribute is converted into attribute node, the limit of described attribute node have one to be connected to respectively class node that it specifies, object properties are converted into the limit connecting two classes;
2) participle special dictionary and keyword index are built: each the record being successively read in data base, the record value read is added in dictionary as user inquire about time participle special dictionary, read out each record time simultaneously using this record value as key, the relation table name corresponding in data base using this record value and row name are as value, composition key-value pair, is deposited in non-relational database, as keyword index, for the key word that quickly location is given, improve search efficiency;
3), after system receives user's Limited Natural Language Query, utilize described step 2) in the special dictionary that constructs limited natural language is decomposed into multiple significant key word;
4) using the key word that decomposites in described step 3) one by one as key, corresponding value is searched in keyword index, i.e. find out the relation table name corresponding to this key word and row name, then the graph data structure generated in described step 1) finds the node that all relation table names are corresponding with row name, finally connected component corresponding for all nodes is extracted from graph data structure, as search space;
5) connected component in the search space constructed in described step 4) is traveled through, find all connected subgraphs that all key words can be coupled together in search space, if any one can not be found to meet the connected subgraph of condition, then find out the connected subgraph comprising key word as much as possible, then the connected subgraph found out is ranked up from big to small according to its key word number comprised, the connected subgraph identical for comprising key word number, then it is ranked up from small to large further according to the limit number comprised, k the connected subgraph that last selected and sorted is the most forward, all connected subgraph numbers that the value of k obtains according to size and the search of data base determine;
6) by k connected graph of selection in described step 5) according to sequence, it is converted into SQL statement successively: the Select clause in SQL statement filled with * according to following rule, in order to represent, all of row are all returned, class node in connected graph is written in the From clause in SQL statement, the limit connecting two class nodes is converted to foreign key relationship be written in the Where clause in SQL statement, in the Where words and expressions that keyword root user inputted is written in SQL statement according to relation table name and the row name of its correspondence;
After described SQL statement generates, data base is inquired about, then Query Result is returned to user.
The most according to claim 1 based on body with the data base query method of limited natural language processing, it is characterised in that described step 2) in non-relational database use MongoDB data base.
CN201310556508.5A 2013-11-11 2013-11-11 A kind of based on body with the data base query method of limited natural language processing Active CN103646032B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310556508.5A CN103646032B (en) 2013-11-11 2013-11-11 A kind of based on body with the data base query method of limited natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310556508.5A CN103646032B (en) 2013-11-11 2013-11-11 A kind of based on body with the data base query method of limited natural language processing

Publications (2)

Publication Number Publication Date
CN103646032A CN103646032A (en) 2014-03-19
CN103646032B true CN103646032B (en) 2017-01-04

Family

ID=50251248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310556508.5A Active CN103646032B (en) 2013-11-11 2013-11-11 A kind of based on body with the data base query method of limited natural language processing

Country Status (1)

Country Link
CN (1) CN103646032B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222044B2 (en) 2014-05-16 2022-01-11 Microsoft Technology Licensing, Llc Natural language image search
CN104021198B (en) * 2014-06-16 2017-09-01 北京理工大学 The relational database information search method and device indexed based on Ontology
CN104239473B (en) * 2014-09-03 2017-06-30 陈飞 The device and method of data cross-system disposal is carried out based on natural language industry knowledge base
CN104636478B (en) * 2015-02-13 2019-12-20 广州神马移动信息科技有限公司 Information query method and equipment
US9959311B2 (en) * 2015-09-18 2018-05-01 International Business Machines Corporation Natural language interface to databases
CN108614842B (en) * 2016-12-13 2021-03-30 北京国双科技有限公司 Method and device for querying data
CN109491658A (en) * 2017-09-11 2019-03-19 高德信息技术有限公司 The generation method and device of computer-executable code data
CN108446289A (en) * 2017-09-26 2018-08-24 北京中安智达科技有限公司 A kind of data retrieval method for supporting heterogeneous database
CN107885786B (en) * 2017-10-17 2021-10-26 东华大学 Natural language query interface implementation method facing big data
CN108920676B (en) * 2018-07-09 2021-09-03 清华大学 Method and system for processing graph data
US11093514B2 (en) * 2018-07-23 2021-08-17 International Business Machines Corporation Ranking of graph patterns
CN109241259B (en) * 2018-08-24 2021-01-05 国网江苏省电力有限公司苏州供电分公司 ER model-based natural language query method, device and system
CN109299129A (en) * 2018-09-05 2019-02-01 深圳壹账通智能科技有限公司 Data query method, apparatus, computer equipment and the storage medium of natural language
CN109446277A (en) * 2018-09-21 2019-03-08 北京翰云时代数据技术有限公司 Relational data intelligent search method and system based on Chinese natural language
CN109408526B (en) * 2018-10-12 2023-10-31 平安科技(深圳)有限公司 SQL sentence generation method, device, computer equipment and storage medium
CN109710742B (en) * 2018-12-27 2021-01-01 清华大学 Method, system and equipment for processing individual stock announcement natural language query
CN110888876A (en) * 2019-10-31 2020-03-17 平安科技(深圳)有限公司 Method and device for generating database script, storage medium and computer equipment
CN111008309B (en) * 2019-12-06 2023-08-08 北京百度网讯科技有限公司 Query method and device
CN112131016A (en) * 2020-09-15 2020-12-25 北京值得买科技股份有限公司 Application program internal data processing method, device and equipment
CN112001188B (en) * 2020-10-30 2021-03-16 北京智源人工智能研究院 Method and device for rapidly realizing NL2SQL based on vectorization semantic rule
CN115080602B (en) * 2022-03-21 2023-05-26 北京科杰科技有限公司 Method for realizing accurate search of data assets based on NLP algorithm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5924089A (en) * 1996-09-03 1999-07-13 International Business Machines Corporation Natural language translation of an SQL query
CN103279458A (en) * 2013-02-22 2013-09-04 电子科技大学 Construction and instantiation method of domain ontology

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5924089A (en) * 1996-09-03 1999-07-13 International Business Machines Corporation Natural language translation of an SQL query
CN103279458A (en) * 2013-02-22 2013-09-04 电子科技大学 Construction and instantiation method of domain ontology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于Ontology的数据库自然语言查询接口的研究;李虎等;《计算机科学》;20100831;200-205 *
基于自然语言语义的数据库目标检索研究;张金等;《计算机系统应用》;20090403;3397-3400 *

Also Published As

Publication number Publication date
CN103646032A (en) 2014-03-19

Similar Documents

Publication Publication Date Title
CN103646032B (en) A kind of based on body with the data base query method of limited natural language processing
US10049143B2 (en) Ontology harmonization and mediation systems and methods
Bergamaschi et al. Keyword search over relational databases: a metadata approach
CN105630881B (en) A kind of date storage method and querying method of RDF
CN109471949B (en) Semi-automatic construction method of pet knowledge graph
Comyn-Wattiau et al. Model driven reverse engineering of NoSQL property graph databases: The case of Neo4j
Agreste et al. XML matchers: approaches and challenges
Zhang et al. Temporal data representation and querying based on RDF
CN107491476A (en) A kind of data model translation and query analysis method suitable for a variety of big data management systems
CN110119404B (en) Intelligent access system and method based on natural language understanding
CN109947914A (en) A kind of software defect automatic question-answering method based on template
Salast et al. Olap2datacube: An ontowiki plug-in for statistical data publishing
Dayal et al. Of cubes, DAGs and hierarchical correlations: A novel conceptual model for analyzing social media data
Wang et al. Effective schema-based XML query optimization techniques
Suryanarayana et al. Stepping towards a semantic web search engine for accurate outcomes in favor of user queries: Using RDF and ontology technologies
Näppilä et al. A tool for data cube construction from structurally heterogeneous XML documents
Yuksel et al. An analysis of RDF storage models and query optimization techniques
Hajmoosaei et al. An ontology-based approach for resolving semantic schema conflicts in the extraction and integration of query-based information from heterogeneous web data sources
Tang et al. Ontology-based semantic retrieval for education management systems
Zhang et al. Ontology database construction for medical knowledge base
Wu et al. Reducing graph matching to tree matching for XML queries with ID references
Li A human-machine method for web table understanding
Fenza et al. Local Semantic Context Analysis for Automatic Ontology Matching.
Ruggero Entity search: How to build virtual documents leveraging on graph embeddings
Jayanthi et al. Referenced attribute Functional Dependency Database for visualizing web relational tables

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20170630

Address after: No. 1 road 210008 Jiangsu Dongji city of Nanjing province Jiangning economic and Technological Development Zone

Patentee after: Nanjing Ke Data Technology Co., Ltd.

Address before: Qinhuai Road, Jiangning District of Nanjing City, Jiangsu province 211100 No. 98 left Ming Yuan 1 Building 2 unit 801

Co-patentee before: Cui Rongguo

Patentee before: Qi Guilin

Co-patentee before: Zhang Hui

Co-patentee before: Deng Bo