CN107885786A - Towards the Natural Language Query Interface implementation method of big data - Google Patents
Towards the Natural Language Query Interface implementation method of big data Download PDFInfo
- Publication number
- CN107885786A CN107885786A CN201710967726.6A CN201710967726A CN107885786A CN 107885786 A CN107885786 A CN 107885786A CN 201710967726 A CN201710967726 A CN 201710967726A CN 107885786 A CN107885786 A CN 107885786A
- Authority
- CN
- China
- Prior art keywords
- node
- value
- query
- tree
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a kind of Natural Language Query Interface implementation method towards big data.Chinese natural language inquiry is converted into SQL or BSON Changing Strategy by the present invention, according to spatial term semantic dependency tree, semantic dependency tree is converted into query tree further according to established rule, the BSON sentences of SQL and NoSQL corresponding to natural language are obtained by parsing query tree.The present invention supports corpus extension, lifts the conversion accuracy in given field.Increase NoSQL switching nodes simultaneously, expand the use range of method.
Description
Technical field
The present invention relates to one kind to support Chinese natural language as input, and is converted to the number of relationship type or non-relational
According to the interface realizing method of database query language.
Background technology
With the rapid development of information technology, global metadata resource is just constantly increasing at an unprecedented rate and product
It is tired, the problem of required knowledge turns into people's extensive concern how is excavated from big data.In traditional relevant database
In, people are inquired about using SQL standard query language, and now user need to grasp substantial amounts of syntactic structure, while to database
In table structure be familiar with, corresponding query statement can be write out according to the query intention of oneself.Interactive Visualization is looked into
Ask the release at interface reduces the learning cost of user to a certain extent, but still can not avoid the need for user and be delimited the organizational structure with one
Journey basis and SQL relevant knowledges, and the query statement that this visual query instrument is generated is without the complete of semanteme
Property, still there is analysis to be intended to it is difficult to or can not select be expressed with visual query boundary condition.
Natural language is mankind's media of communication the most known, if the query interface of one support natural language of design, right
Information excavating in big data is significant.However, the existing query interface technology based on natural language is in big data environment
In still have obvious limitation, the main portability including interface is limited, i.e., current techniques are mainly responsible for natural language to pass
It is the conversion of database sql like language, the inquiry conversion to NoSQL databases is not supported;Language is limited, i.e., mainly supports English
Language natural language;Not strong to the specific aim in field, i.e., current interfacing is directed to general field, causes to be transformed into bottom number
Big according to difficulty during the language of storehouse, availability is not strong.
The content of the invention
It is an object of the invention to provide a Natural Language Query Interface towards big data.
In order to achieve the above object, looked into the technical scheme is that providing a kind of natural language towards big data
Ask interface realizing method, it is characterised in that comprise the following steps:
Step 1, right language inquiry sentence is obtained from from front end, calls JAR Packet analyzings corresponding to Stanford Parser
Natural language querying, obtain corresponding semantic dependency tree;
Step 2, definition inquiry switching node;
Step 3, according to database to be checked, be to add specific mapping value per a kind of switching node of inquiring about, save as
Lower form:NodeValue | NodeType }, structure base map storehouse, wherein:NodeValue is mapping value field,
NodeType is the type field of inquiry switching node;
Step 4, the similarity for calculating semantic dependency tree interior joint w1 and each element NodeValue in base map storehouse,
It is ranked up according to similarity, similarity is more than candidate mappings of the preceding Top K of the given threshold value element as node w1;
Step 5, for avoid semantic ambiguity be the problem of subsequent conversion is brought, by the candidate for the node w1 being calculated
Front end is passed in mapping back, structuring user's interactive selection interface, user is chosen into optimal mapping value and passes backstage back, obtains language
The final mapping of adopted dependent tree;If node w1 candidate mappings are incorrect, manually entered and mapped by user, obtain it is semantic according to
Lai Shu final mapping;
The semantic dependency tree that step 6, adjustment have finally been mapped, travels through semantic dependency tree node, and rejecting does not map
Node, his father's parent's node will be moved in the node subtree;
Step 7, query tree structure is defined, semantic dependency tree is converted into query tree, comprised the following steps that:
Step 8, adjustment query tree structure, eliminate Semantic fuzziness, comprise the following steps:
Step 9, the selection according to user, query tree is converted into NoSQL or SQL query.
Preferably, in the step 2, the type of the inquiry switching node is respectively:Select node, running node,
Function node, table structure node, value node, measure word node, logical node, embedded table node, MR nodes, wherein, select node
The keyword synonymous with " inquiry ", " retrieval " in corresponding natural language;The corresponding operation similar with " being more than " of running node is calculated
Son;Function node is corresponding with assembling similar operation;Table structure node is divided into two classes, respectively the table and category in correspondence database
Property name;The corresponding keyword synonymous with " part ", " whole " of measure word node;Logical node is corresponding same with "AND", "or", " non-"
The word of justice;Embedded table node and MR nodes are the NoSQL nodes newly increased, correspond to the data text in NoSQL databases respectively
Part and operation.
Preferably, the step 3 includes:
Step 3.1, table structure node is added into base map storehouse, comprised the following steps:
Step 3.1.1, database to be checked is connected, obtains the table name and field information of all tables in database;
Step 3.1.2, with json forms { Namenode:NameNode.Value | NameNode.Table } encapsulation is respectively
Individual table name, with { Namenode:NameNode.Value | NameNode.Attribute } encapsulation fields name, and store to section
Point mapping library, wherein, Namenode is table structure node type, and NameNode.Value is value field, NameNode.Table
For literary name section, NameNode.Attribute attribute fields;
Step 3.2, embedded table node is added into base map storehouse, comprised the following steps:
Step 3.2.1, the file structure inquired about in database, retrieves embedded document;
Step 3.2.2, field name in embedded document is obtained, with json forms { DocumentNode:
DocumentNode.Value | DocumentNode.TABLE } encapsulation internal document name, with { DocumentNode:
{ DocumentNode.Value | DocumentNode.TABLE.ATTRIBUTE } } encapsulation internal document field name to node reflects
Storehouse is penetrated, wherein, DocumentNode is embedded document node type, and DocumentNode.Value is the value word of embedded document
Section, DocumentNode.TABLE are the document file-name field of embedded document, and DocumentNode.TABLE.ATTRIBUTE is interior
The attribute field of embedding document;
Step 3.3, value node is added into node mapping library, comprised the following steps:
Step 3.3.1, oversampling ratio β is set;
Step 3.3.2, literary name section is accessed successively with data base query language, the specific of the field is extracted by oversampling ratio
Value, and repetition values are eliminated, with efficiency during lifting values node matching;
Step 3.3.3, according to the occurrence collection for the specified literary name section being drawn into, with json forms { valuenode:
Value | TABLE.ATTRIBUTE }] store to node mapping library, wherein, valuenode is value node type, and value is
The value field of value node, TABLE.ATTRIBUTE are the corresponding table and attribute field of value node;
Step 3.4, it is directed to other kinds of node, User- defined Node converted contents, and with json forms
{NodeType:NodeValue | NodeType } store to node mapping library.
Preferably, the step 4 includes:
Step 4.1, field corresponded to based on database, the semantic dictionary storehouse extension towards specific area is carried out, in Hownet
Literary semantic base is expanded, and is comprised the following steps:
Step 4.1.1, each field value is segmented, word frequency statisticses, topS list before selection is carried out to word lexicon storehouse
Word, as extension seed words;
Step 4.1.2, basic meaning in Hownet is former and the original justice of relation justice, by the preceding topS of selection word with
And the structural information of database is described with adopted original, is extended in semantic dictionary;
Step 4.2, the semantic dictionary storehouse according to extension, calculate every in semantic dependency tree interior joint w1 and base map storehouse
One element NodeValue similarity, the element that the node in semantic dependency tree is mapped in base map storehouse calculate, root
It is ranked up according to similarity, similarity is more than candidate mappings of the preceding Top K of the given threshold value element as node w1.
Preferably, the step 4.2 includes:
Step 4.2.1, semantic dependency tree interior joint is matched into the other kinds of node in addition to value node first, specifically
Step is as follows:
Step 4.2.1.1, based on Hownet semantic base, x type sections in semantic dependency tree node w1 and base map storehouse are calculated
The lower value w2 of point similarity, it is specific as follows:
Step 4.2.1.1.1, Hownet semantic base WHOLE.DAT files are loaded, build tree-shaped adopted former hierarchical structure T;
Step 4.2.1.1.2, glossary.dat files are loaded, build vocabulary, node w1 is searched and value w2 is possessed
Concept and corresponding justice it is former;
If step 4.2.1.1.3, corresponding justice was single originally, similarity isIn formula, d is that two justice originals exist
The shortest path length being connected in the tree-shaped adopted former hierarchical structure T built in step 4.1.1.1, α is adjustment parameter;It is if right
Ying Yiyuan is a set, and the average of the former similarity of each matching justice of set of computations is the similarity of two words;
Step 4.2.1.2, the preceding top K candidate mappings values that similarity is more than or equal to given threshold are chosen;
If step 4.2.2, similarity is less than given threshold in step 4.2.1.2, mapping of the word for value node is carried out
Calculate, comprise the following steps:
Step 4.2.2.1, the calculating similarity based method to take steps in 4.2.1.1, calculate node w1 and each field
The similarity of value;
Step 4.2.2.2, top K candidate mappings values before choosing;
If step 4.2.2.3, minimum preceding top K candidate mappings value is less than given threshold, top K are waited before taking-up
Choosing mapping value field, all values of the inquiry corresponding field in database, calculates similarity, top K reflect before choosing again one by one
Penetrate value.
Preferably, the step 7 includes:
Step 7.1, foundation node map type, define query tree syntax rule, specific rules are as follows:
1) root node-> (query sub tree) (condition subtree) *
2) query sub tree-> selects node+GNP
3) condition subtree-> operator nodes | (MR nodes+operator node)+(leftSubtree* rightSubtree)
4) leftSubtree- > GNP
5) rightSubtree- > GNP | value node | MIN | MAX
6) GNP- > (function node+GNP) | NP
7) NP- > tables structure node | embedded table node+(table structure node | embedded table node) * (Condition) *
8) Condition- > value nodes | (operator node+value node)
Wherein ,-> represents overlying relation;* brotgher of node relation is represented;+ represent father and son's node relationships;| represent or close
System;;LeftSubtree represents left subtree;RightSubtree represents right subtree;MIN represents function of minimizing;MAX is represented
Maximizing function;GNP and NP is customized middle sub-tree structure;Condition represents condition subtree;
Step 7.2, reconstruct semantic dependency tree, semantic dependency tree is converted into corresponding query tree, comprised the following steps:
Step 7.2.1, semantic dependency tree, random movement node and subtree to random site, and mobile node are traveled through
When, it is adjusted with reference to database structure;
Step 7.2.2, newly-generated query tree is traveled through, newly-generated query tree is carried out according to query tree syntax rule
Scoring, meet query tree syntax rule node it is more scoring it is higher, query tree is stored in candidate queue;
Step 7.2.3, top t candidate query tree returns to front end and interacted before choosing, and user chooses and most correctly looked into
Ask tree construction.
Preferably, the step 8 includes:Semantic fuzziness between step 8.1, elimination complex query, including following step
Suddenly:
Step 8.1.1, core node concept is defined, for complex query, under the conditions of core node is exactly complex query,
The table structure node of the superiors or embedded table node in left subtree/right subtree;
If step 8.1.2, under the conditions of complex query left and right subtree core node Type-Inconsistencies, by left subtree
Core node addition into right subtree new core node;
Step 8.2, the Semantic fuzziness for returning to field is eliminated, left subtree is selected into the table structure node or interior under node
Embedding table node is added under right subtree;
Step 8.3, function node semantic ambiguity sex chromosome mosaicism is eliminated, comprised the following steps:
If the table structure node or embedded table node step 8.3.1, under function node are not a statistical numerical value,
The function node of counting is then added on table structure node or embedded table node;
Step 8.3.2, for being present in the function node in running node subtree, by left subtree under running node
Function node is added to right subtree correspondence position, or the function node in right subtree under running node is added into left subtree pair
Answer position.
Preferably, the step 9 includes:
Step 9.1, query block concept is defined, set block level nest relation, comprise the following steps:
Step 9.1.1, query block is to select node, the table structure node modified by measure word node, embedded table section
Point, the subtree that function node is root node, wherein to select block based on subtree of the node as root node;
If step 9.1.2, block A root node is father's node of another block B root nodes, claim block A direct
Comprising block B, main piece is outermost layer block;
Step 9.2, query tree is converted into NoSQL query statements;
Step 9.3, query tree is converted into SQL query statement.
Chinese natural language inquiry is converted into SQL or BSON Changing Strategy by the present invention, according to spatial term language
Adopted dependent tree, semantic dependency tree is converted into query tree further according to established rule, natural language pair is obtained by parsing query tree
The SQL and NoSQL BSON sentences answered.The present invention supports corpus extension, lifts the conversion accuracy in given field.Simultaneously
Increase NoSQL switching nodes, expand the use range of method.
Brief description of the drawings
Fig. 1 is that Chinese natural language inquires about semantic dependency tree corresponding to q;
Fig. 2 is that Chinese natural language inquires about the final mapping that q corresponds to semantic dependency tree;
Fig. 3 is query tree corresponding to natural language querying q.
Embodiment
To become apparent the present invention, hereby it is described in detail below with preferred embodiment.
The invention provides a kind of Changing Strategy that Chinese natural language inquiry is converted into SQL or BSON.Pin of the present invention
Lack adaptivity to specific area to current Natural Language Query Interface and be difficult to turn NoSQL data base queryings
The confinement problems such as change, the semantic extension based on field is carried out to used corpus in transfer process, and provide and look into
Conversion method of the tree to the NoSQL data base query languages such as BSON is ask, improves the degree of accuracy and the scope of application of inquiry conversion.
Comprise the following steps that:
Step 1, right language inquiry sentence is obtained from from front end, calls JAR Packet analyzings corresponding to Stanford Parser
Natural language querying, obtain corresponding semantic dependency tree.(the q by taking following query statement q as an example:Inquiry did puncture and it is older
In 44 number of patients), resulting semantic dependency tree is as shown in Figure 1.
Step 2, nine classes inquiry switching node is defined, its type is respectively:" selection node (SelectNode), operation section
Point (OperatorNode), function node (FunctionNode), table structure node (NameNode), value node
(ValueNode), measure word node (NumeralNode), logical node (LogicNode), embedded table node
(DocumentNode), MR nodes (MRNode) ".Wherein, selection node corresponds to " inquiry, retrieval " in natural language etc. and closed
Key word.The operation operators such as running node corresponding " be more than, less than ".The operations such as the corresponding aggregation of function node.Table structure node is divided into
Two classes, distinguish table (NameNode.Table) and attribute-name (NameNode.Attribute) in correspondence database, measure word section
The keywords such as point corresponding " partly, all ".Logical node it is corresponding " with or, it is non-" etc. word.Embedded table node and MR nodes are new
Increased NoSQL nodes, correspond to the data file in NoSQL databases and operation respectively.
Step 3, according to database to be checked, be that specific mapping value is added per a kind of node, save as following form:
NodeValue | and NodeType } structure base map storehouse.Specifically include:
Step 3.1, table structure node class is added into base map storehouse, comprised the following steps that:
Step 3.1.1, database to be checked is connected, obtains the table name and field information of all tables in database;
Step 3.1.2, with json forms { Namenode:NameNode.Value | NameNode.Table } encapsulation is respectively
Individual table name, with { Namenode:NameNode.Value | NameNode.Attribute } encapsulation fields name, and store to section
Point mapping library, wherein, Namenode is table structure type, and NameNode.Value is value field, and NameNode.Table is table
Field, NameNode.Attribute attribute fields.
Assuming that " puncture " table in database be present, puncture in table and " age " field be present, then the map information generated is such as
Under:{NameNode:Puncture | and NameNode.Table, puncture the ages | NameNode.Attribute }.
Step 3.2, embedded table node class is added into base map storehouse, comprised the following steps that:
Step 3.2.1, the file structure inquired about in database, retrieves embedded document;
Step 3.2.2, field name in embedded document is obtained, with json forms { DocumentNode:
DocumentNode.Value | DocumentNode.TABLE } encapsulation internal document name, with { DocumentNode:
{ DocumentNode.Value | DocumentNode.TABLE.ATTRIBUTE } } encapsulation internal document field name to node reflects
Storehouse is penetrated, DocumentNode is embedded document node type, and DocumentNode.Value is that the value of embedded document is (troublesome to mend
Fill) field, DocumentNode.TABLE is document name (trouble supplement) field of embedded document,
DocumentNode.TABLE.ATTRIBUTE is attribute (trouble supplement) field of embedded document;
Step 3.3, value node class is added into node mapping library, comprised the following steps that:
Step 3.3.1, oversampling ratio β is set;
Step 3.3.2, literary name section is accessed successively with data base query language, the specific of the field is extracted by oversampling ratio
Value, and repetition values are eliminated, with efficiency during lifting values node matching;
Step 3.3.3, according to the occurrence collection for the specified literary name section being drawn into, with json forms { ValueNode:
Value | TABLE.ATTRIBUTE }] store to node mapping library, wherein, valuenode is value node type, and value is
Value (trouble supplement) field of value node, TABLE.ATTRIBUTE is the corresponding table and attribute field of value node;.
For example, puncture age field sampled value 44 in table, corresponding information { valuenode:44 | puncture the ages }
Step 3.4, it is directed to other node types, User- defined Node converted contents, and with { NodeType:
{ NodeValue | NodeType } } json forms are stored to node mapping library.
For example, content { SelectNode corresponding to query node:Inquiry, finds out | SelectNode }.
Step 4, the similarity for calculating semantic dependency tree interior joint w1 and each element NodeValue in base map storehouse,
It is ranked up according to similarity, similarity is more than the preceding Top K of given threshold value element as its candidate mappings.
Step 4.1, field corresponded to based on database, the semantic dictionary storehouse extension towards specific area is carried out, in Hownet
Literary semantic base is expanded, specific as follows
Step 4.1.1, each field value is segmented, word frequency statisticses, top s list before selection is carried out to word lexicon storehouse
Word, as extension seed words;
Step 4.1.2, basic meaning original, relation justice original in Hownet etc. define, by the top s seed words of selection with
And the structural information of database is described with adopted original, is extended in semantic dictionary;
Step 4.2, the semantic dictionary storehouse according to extension, Words similarity is calculated, the node in semantic dependency tree is mapped
Element to base map storehouse.
Step 4.2.1, semantic dependency tree interior joint is matched into the node type in addition to value node type first, specific step
It is rapid as follows:
Step 4.2.1.1, based on Hownet (Hownet) semantic base, semantic dependency tree node w1 and base map storehouse are calculated
Value w2 similarity under interior joint type x.It is specific as follows:
Step 4.2.1.1.1, Hownet semantic base WHOLE.DAT files are loaded, build tree-shaped adopted former hierarchical structure T;
Step 4.2.1.1.2, glossary.dat files are loaded, build vocabulary, search the concept that w1 and w2 is possessed
And corresponding justice original;
Step 4.2.1.1.3, it isIn formula, d is the former tree-shaped justice built in step 4.1.1.1 of two justice
The shortest path length being connected in former hierarchical structure T, α is adjustment parameter;If corresponding justice was a set originally, set of computations is each
The average of the former similarity of individual matching justice is the similarity of two words.
Step 4.2.1.2, the preceding top K candidate mappings values that similarity is more than or equal to given threshold are chosen;
If step 4.2.2, similarity is less than given threshold in step 4.2.1.2, mapping of the word for value node is carried out
Calculate, it is specific as follows:
Step 4.2.2.1, the calculating similarity based method to take steps in 4.2.1.1, w1 and the value of each field are calculated
Similarity;
Step 4.2.2.2, top K candidate mappings values before choosing;
If step 4.2.2.3, minimum preceding top K candidate mappings value is less than given threshold, top K are waited before taking-up
Choosing mapping value field, all values of the inquiry corresponding field in database, calculates similarity, top K reflect before choosing again one by one
Penetrate value.
Step 5, for avoid semantic ambiguity be the problem of subsequent conversion is brought, the node candidate being calculated is mapped
Pass front end back, structuring user's interactive selection interface, user is chosen into optimal mapping value and passes backstage back.If candidate mappings
Mapping result it is incorrect, support user manually enter mapping.Selected according to user, obtain the final of semantic dependency tree and reflect
Penetrate.
Step 6, adjustment have obtained the semantic dependency tree of optimum mapping, travel through tree node, reject the node without mapping,
His father's parent's node will be moved in the node subtree;
Step 7, query tree structure is defined, semantic dependency tree is converted into query tree, comprised the following steps that:
Step 7.1, foundation node map type, define query tree syntax rule, specific rules are as follows:
1) root node-> (inquiry clause) (if-clause) *
2) inquiry clause-> selects node+GNP
3) if-clause-> operators node | (MR nodes+operator node)+(leftSubtree* rightSubtree)
4) leftSubtree- > GNP
5) rightSubtree- > GNP | value node | MIN | MAX
6) GNP- > (function node+GNP) | NP
7) NP- > tables structure node | embedded table node+(table structure node | embedded table node) * (Condition) *
8) Condition- > value nodes | (operator node+value node)
Wherein ,-> represents overlying relation;* brotgher of node relation is represented;+ represent father and son's node relationships;| represent or close
System;;LeftSubtree represents left subtree;RightSubtree represents right subtree;MIN represents function of minimizing;MAX is represented
Maximizing function;GNP and NP is customized middle sub-tree structure;Condition represents condition subtree;
Step 7.2, reconstruct semantic dependency tree, semantic dependency tree is converted into corresponding query tree, step includes:
Step 7.2.1, semantic dependency tree, random movement node and subtree to random site, and mobile node are traveled through
When, it is adjusted with reference to database structure;
Step 7.2.2, newly-generated query tree is traveled through, newly-generated query tree is carried out according to query tree syntax rule
Scoring, meet query tree syntax rule node it is more scoring it is higher, query tree is stored in candidate queue;
Step 7.2.3, top K candidate query tree returns to front end and interacted before choosing, and user chooses and most correctly looked into
Ask tree construction.
The query tree that Fig. 1 is finally returned that is as shown in Figure 3.
Step 8, adjustment query tree structure, eliminate Semantic fuzziness, comprise the following steps that:
Semantic fuzziness between step 8.1, elimination complex query, the step include:
Step 8.1.1, core node concept is defined, for complex query, under the conditions of core node is exactly complex query,
The table structure node of the superiors or embedded table node in left subtree (right subtree);
If step 8.1.2, under the conditions of complex query left and right subtree core node Type-Inconsistencies, by left subtree
Core node addition into right subtree new core node;
Step 8.2, the Semantic fuzziness for returning to field is eliminated, left subtree is selected into the table structure node or interior under node
Embedding table node is added under right subtree;
Step 8.3, function node semantic ambiguity sex chromosome mosaicism is eliminated, the step includes:
If the table structure node or embedded table node step 8.3.1, under function node are not a statistical numerical value,
The function node of counting is then added on table structure node or embedded table node;
Step 8.3.2, for being present in the function node in running node subtree, by running node bottom left (right side) subtree
In function node be added to right (left side) subtree correspondence position.
Step 9, the selection according to user, query tree is converted into NoSQL or SQL query.Comprise the following steps that:
Step 9.1, query block concept is defined, set block level nest relation, comprise the following steps that:
Step 9.1.1, query block is to select node, the table structure node modified by measure word node, embedded table section
Point, the subtree that function node is root node, wherein to select block based on subtree of the node as root node;
If step 9.1.2, block A root node is father's node of another block B root nodes, claim block A direct
Comprising block B, main piece is outermost layer block;
Step 9.2, query tree is converted into NoSQL query statements, changed by taking MongoDB databases as an example, specifically
Step is as follows:
Step 9.2.1, it is bottom-up that query block is converted into subquery according to each block nesting relation, specifically include:
Step 9.2.1.1, internally in nested block, by corresponding to involved table structure node or embedded table node
Document is associated, and access path is obtained added to $ lookup fields, and according to database diagram, added to correspondingly
LocalField and foreignField parts, associated document deposit from fields, if existing in its internal block involved by inquiry
The table structure node or embedded table node arrived, then be put into $ lookup fields, it is not necessary to go to carry again by the Query Result of its internal block
Database is taken to correspond to table name;
Step 9.2.1.2, each table structure node or embedded table node corresponding field are added to $ project fields;
Step 9.2.1.3, running node and involved table structure node or embedded table node are added to $ match
Corresponding part;
Step 9.2.1.4, function node is related into operation to be added in $ group, and operator is formulated according to type;
Step 9.2.1.5, output result collection is named according to the nest relation of block, added to out parts;
If step 9.2.2, this layer of block has MR nodes, conversion process is as follows:
Step 9.2.2.1, table structure node or nested table node are added to the emit of map functions in mapreduce
In, designated packet field;
Step 9.2.2.2, running node is added to query parts in reduce, given filter field;
Step 9.2.2.3, the return parts that function node is added in reduce, specify and carry out converging operation
Type;
Step 9.2.2.4, output result collection is named according to the nest relation of block, added to the out parts in reduce;
Step 9.2.3, block successively nested conversion, if existence function node is related to converging operation, block intra sub-block is added
It is associated to $ lookup parts;If common inquiry operation, relate only to connect, by table structure node, embedded table section
Content involved by point is added to find parts, is filtered according to running node content adding conditional operator, and according to nesting
Relation name storage subquery results;
Step 9.2.4, main piece is outermost layer block, and the table structure node selected under node or embedded table node are corresponded into word
Section, which is added to specify in find, returns to field.
Step 9.3, query tree is converted into SQL query statement, is converted into SQL compared to NoSQL processes are converted into and wants phase
To succinct, conversion process is unanimous on the whole, and this section is only described briefly, and step is as follows:
Step 9.3.1, it is bottom-up that query block is converted into subquery according to each block nesting relation;
Step 9.3.2, running node and correlation table structure node are added to WHERE parts, build querying condition;
Step 9.3.3, the table structure node being involved in is added to FROM parts;
Step 9.3.4, function node and correlation table structure node are related into operation to be added in SELECT clause, structure
Build polymerizing condition.The SQL query statement that query tree as shown in Figure 3 is converted to is as follows:
q2:Select count (patient information table .*)
From patient information tables, puncture
Where patient information table medical treatment card number=puncture medical treatment card numbers and punctures ages > 44.
Claims (8)
1. a kind of Natural Language Query Interface implementation method towards big data, it is characterised in that comprise the following steps:
Step 1, right language inquiry sentence is obtained from from front end, calls JAR Packet analyzing nature languages corresponding to Stanford Parser
Speech inquiry, obtain corresponding semantic dependency tree;
Step 2, definition inquiry switching node;
Step 3, according to database to be checked, be to add specific mapping value per a kind of switching node of inquiring about, save as following lattice
Formula:NodeValue | NodeType }, structure base map storehouse, wherein:NodeValue is mapping value field, and NodeType is
Inquire about the type field of switching node;
Step 4, the similarity for calculating semantic dependency tree interior joint w1 and each element NodeValue in base map storehouse, according to
Similarity is ranked up, and similarity is more than candidate mappings of the preceding Top K of the given threshold value element as node w1;
Step 5, for avoid semantic ambiguity be the problem of subsequent conversion is brought, the candidate mappings for the node w1 being calculated are passed
Front end is gone back to, structuring user's interactive selection interface, user is chosen into optimal mapping value and passes backstage back, obtains semantic dependency tree
Final mapping;If node w1 candidate mappings are incorrect, manually entered and mapped by user, obtain the final of semantic dependency tree
Mapping;
The semantic dependency tree that step 6, adjustment have finally been mapped, travels through semantic dependency tree node, rejects the section without mapping
Point, his father's parent's node will be moved in the node subtree;
Step 7, query tree structure is defined, semantic dependency tree is converted into query tree, comprised the following steps that:
Step 8, adjustment query tree structure, eliminate Semantic fuzziness, comprise the following steps:
Step 9, the selection according to user, query tree is converted into NoSQL or SQL query.
A kind of 2. Natural Language Query Interface implementation method towards big data as claimed in claim 1, it is characterised in that
In the step 2, the type of the inquiry switching node is respectively:Select node, running node, function node, table structure section
Point, value node, measure word node, logical node, embedded table node, MR nodes, wherein, select node correspond in natural language with
The synonymous keyword of " inquiry ", " retrieval ";The corresponding operation operator similar with " being more than " of running node;Function node it is corresponding with it is poly-
Collect similar operation;Table structure node is divided into two classes, respectively the table and attribute-name in correspondence database;Measure word node it is corresponding with
The synonymous keyword of " part ", " whole ";The corresponding word synonymous with "AND", "or", " non-" of logical node;Embedded table node and MR
Node is the NoSQL nodes newly increased, corresponds to the data file in NoSQL databases and operation respectively.
A kind of 3. Natural Language Query Interface implementation method towards big data as claimed in claim 2, it is characterised in that institute
Stating step 3 includes:
Step 3.1, table structure node is added into base map storehouse, comprised the following steps:
Step 3.1.1, database to be checked is connected, obtains the table name and field information of all tables in database;
Step 3.1.2, with json forms { Namenode:NameNode.Value | NameNode.Table } each table of encapsulation
Name, with { Namenode:NameNode.Value | NameNode.Attribute } encapsulation fields name, and store to node and reflect
Storehouse is penetrated, wherein, Namenode is table structure node type, and NameNode.Value is value field, and NameNode.Table is table
Field, NameNode.Attribute attribute fields;
Step 3.2, embedded table node is added into base map storehouse, comprised the following steps:
Step 3.2.1, the file structure inquired about in database, retrieves embedded document;
Step 3.2.2, field name in embedded document is obtained, with json forms { DocumentNode:
DocumentNode.Value | DocumentNode.TABLE } encapsulation internal document name, with { DocumentNode:
{ DocumentNode.Value | DocumentNode.TABLE.ATTRIBUTE } } encapsulation internal document field name to node reflects
Storehouse is penetrated, wherein,;DocumentNode is embedded document node type, and DocumentNode.Value is the value word of embedded document
Section, DocumentNode.TABLE are the document file-name field of embedded document, and DocumentNode.TABLE.ATTRIBUTE is interior
The attribute field of embedding document;
Step 3.3, value node is added into node mapping library, comprised the following steps:
Step 3.3.1, oversampling ratio β is set;
Step 3.3.2, literary name section is accessed successively with data base query language, the occurrence of the field is extracted by oversampling ratio, and
Repetition values are eliminated, with efficiency during lifting values node matching;
Step 3.3.3, according to the occurrence collection for the specified literary name section being drawn into, with json forms { valuenode:{value|
TABLE.ATTRIBUTE }] } store to node mapping library, wherein, valuenode is value node type, and value is value node
Value field, TABLE.ATTRIBUTE are table and attribute field corresponding to value node;
Step 3.4, it is directed to other kinds of node, User- defined Node converted contents, and with json forms { NodeType:
NodeValue | NodeType } store to node mapping library.
A kind of 4. Natural Language Query Interface implementation method towards big data as claimed in claim 2, it is characterised in that institute
Stating step 4 includes:
Step 4.1, field corresponded to based on database, the semantic dictionary storehouse extension towards specific area is carried out, to Hownet Chinese
Yi Ku is expanded, and is comprised the following steps:
Step 4.1.1, each field value is segmented, word frequency statisticses is carried out to word lexicon storehouse, topS word before selection, are made
To extend seed words;
Step 4.1.2, the basic meaning in Hownet is former and relation justice is original adopted, by the preceding topS of selection word and data
The structural information in storehouse is described with adopted original, is extended in semantic dictionary;
Step 4.2, the semantic dictionary storehouse according to extension, calculate semantic dependency tree interior joint w1 and each element in base map storehouse
NodeValue similarity, the element that the node in semantic dependency tree is mapped in base map storehouse calculates, according to similarity
It is ranked up, similarity is more than candidate mappings of the preceding Top K of the given threshold value element as node w1.
A kind of 5. Natural Language Query Interface implementation method towards big data as claimed in claim 4, it is characterised in that institute
Stating step 4.2 includes:
Step 4.2.1, semantic dependency tree interior joint is matched into the other kinds of node in addition to value node first, specific steps are such as
Under:
Step 4.2.1.1, based on Hownet semantic base, calculate in semantic dependency tree node w1 and base map storehouse under x type nodes
Value w2 similarity, it is specific as follows:
Step 4.2.1.1.1, Hownet semantic base WHOLE.DAT files are loaded, build tree-shaped adopted former hierarchical structure T;
Step 4.2.1.1.2, glossary.dar files are loaded, build vocabulary, lookup node w1 and value w2 are possessed general
Read and corresponding justice is former;
If step 4.2.1.1.3, corresponding justice was single originally, similarity isIn formula, d is two justice originals in step
4.1.1.1 the shortest path length being connected in the tree-shaped adopted former hierarchical structure T of middle structure, α is adjustment parameter;If corresponding justice is former
For a set, the average of the former similarity of each matching justice of set of computations is the similarity of two words;
Step 4.2.1.2, the preceding top K candidate mappings values that similarity is more than or equal to given threshold are chosen;
If step 4.2.2, similarity is less than given threshold in step 4.2.1.2, mapping calculation of the word for value node is carried out,
Comprise the following steps:
Step 4.2.2.1, the calculating similarity based method to take steps in 4.2.1.1, calculate node w1 and the value of each field
Similarity;
Step 4.2.2.2, top K candidate mappings values before choosing;
Step 4.2.2.3, if minimum preceding top K candidate mappings value is less than given threshold, top K candidate mappings before taking-up
Value field, all values of the inquiry corresponding field in database, calculates similarity, top K mapping values before choosing again one by one.
A kind of 6. Natural Language Query Interface implementation method towards big data as claimed in claim 1, it is characterised in that institute
Stating step 7 includes:
Step 7.1, foundation node map type, define query tree syntax rule, specific rules are as follows:
1) root node-> (inquiry clause) (if-clause) *
2) inquiry clause-> selects node+GNP
3) if-clause-> operators node | (MR nodes+operator node)+(leftSubtree*rightSubtree)
4) leftSubtree- > GNP
5) rightSubtree- > GNP | value node | MIN | MAX
6) GNP- > (function node+GNP) | NP
7) NP- > tables structure node | embedded table node+(table structure node | embedded table node) * (Condition) *
8) Condition- > value nodes | (operator node+value node)
Wherein ,-> represents overlying relation;* brotgher of node relation is represented;+ represent father and son's node relationships;| expression or relation;;
LeftSubtree represents left subtree;RightSubtree represents right subtree;MIN represents function of minimizing;MAX represents to ask most
Big value function;GNP and NP is customized middle sub-tree structure;Condition represents condition subtree;
Step 7.2, reconstruct semantic dependency tree, semantic dependency tree is converted into corresponding query tree, comprised the following steps:
Step 7.2.1, semantic dependency tree is traveled through, random movement node and subtree are to random site, and during mobile node, reference
Database structure is adjusted;
Step 7.2.2, newly-generated query tree is traveled through, newly-generated query tree is scored according to query tree syntax rule,
It is higher to meet the more scorings of node of query tree syntax rule, query tree is stored in candidate queue;
Step 7.2.3, top t candidate query tree returns to front end and interacted before choosing, and user chooses most correct query tree
Structure.
A kind of 7. Natural Language Query Interface implementation method towards big data as claimed in claim 1, it is characterised in that institute
Stating step 8 includes:Semantic fuzziness between step 8.1, elimination complex query, comprises the following steps:
Step 8.1.1, core node concept is defined, for complex query, core node is exactly Zuo Zi under the conditions of complex query
The table structure node of the superiors or embedded table node in tree/right subtree;
If step 8.1.2, under the conditions of complex query left and right subtree core node Type-Inconsistencies, by the core of left subtree
Heart node adds the new core node into right subtree;
Step 8.2, the Semantic fuzziness for returning to field is eliminated, left subtree is selected into the table structure node under node or embedded table section
Point is added under right subtree;
Step 8.3, function node semantic ambiguity sex chromosome mosaicism is eliminated, comprised the following steps:
If the table structure node or embedded table node step 8.3.1, under function node are not a statistical numerical value, in table
The function node of counting is added on structure node or embedded table node;
Step 8.3.2, for being present in the function node in running node subtree, by the function section in left subtree under running node
Point is added to right subtree correspondence position, or the function node in right subtree under running node is added into left subtree correspondence position.
A kind of 8. Natural Language Query Interface implementation method towards big data as claimed in claim 1, it is characterised in that institute
Stating step 9 includes:
Step 9.1, query block concept is defined, set block level nest relation, comprise the following steps:
Step 9.1.1, query block is to select node, the table structure node modified by measure word node, embedded table node, function
Node is the subtree of root node, wherein to select block based on subtree of the node as root node;
If step 9.1.2, block A root node is father's node of another block B root nodes, block A is claimed directly to include
Block B, main piece is outermost layer block;
Step 9.2, query tree is converted into NoSQL query statements;
Step 9.3, query tree is converted into SQL query statement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710967726.6A CN107885786B (en) | 2017-10-17 | 2017-10-17 | Natural language query interface implementation method facing big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710967726.6A CN107885786B (en) | 2017-10-17 | 2017-10-17 | Natural language query interface implementation method facing big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107885786A true CN107885786A (en) | 2018-04-06 |
CN107885786B CN107885786B (en) | 2021-10-26 |
Family
ID=61781656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710967726.6A Active CN107885786B (en) | 2017-10-17 | 2017-10-17 | Natural language query interface implementation method facing big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107885786B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109947794A (en) * | 2019-02-21 | 2019-06-28 | 东华大学 | A kind of interactive natural language inquiry conversion method |
CN110119404A (en) * | 2019-04-12 | 2019-08-13 | 杭州量之智能科技有限公司 | A kind of intelligence access system and method based on natural language understanding |
CN110213651A (en) * | 2019-05-28 | 2019-09-06 | 暨南大学 | A kind of intelligent merit Computer Aided Analysis System and method based on security protection video |
CN110334179A (en) * | 2019-05-22 | 2019-10-15 | 深圳追一科技有限公司 | Question and answer processing method, device, computer equipment and storage medium |
WO2020008187A1 (en) * | 2018-07-02 | 2020-01-09 | Babylon Partners Limited | A computer implemented method for extracting and reasoning with meaning from text |
CN110688394A (en) * | 2019-09-29 | 2020-01-14 | 浙江大学 | NL generation SQL method for novel power supply urban rail train big data operation and maintenance |
CN111538854A (en) * | 2020-04-27 | 2020-08-14 | 北京百度网讯科技有限公司 | Searching method and device |
CN112437917A (en) * | 2018-07-25 | 2021-03-02 | 甲骨文国际公司 | Natural language interface for databases using autonomous agents and thesaurus |
CN112507098A (en) * | 2020-12-18 | 2021-03-16 | 北京百度网讯科技有限公司 | Question processing method, question processing device, electronic equipment, storage medium and program product |
CN113536741A (en) * | 2020-04-17 | 2021-10-22 | 复旦大学 | Method and device for converting Chinese natural language into database language |
CN114090627A (en) * | 2022-01-19 | 2022-02-25 | 支付宝(杭州)信息技术有限公司 | Data query method and device |
CN114138817A (en) * | 2021-12-03 | 2022-03-04 | 中国建设银行股份有限公司 | Data query method, device, medium and product based on relational database |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103646032A (en) * | 2013-11-11 | 2014-03-19 | 漆桂林 | Database query method based on body and restricted natural language processing |
CN105279168A (en) * | 2014-06-24 | 2016-01-27 | 华为技术有限公司 | Data query method supporting natural language, open platform, and user terminal |
US20170097990A1 (en) * | 2014-03-03 | 2017-04-06 | Michael L. Hamm | Text-sql relational database |
CN107016012A (en) * | 2015-09-11 | 2017-08-04 | 谷歌公司 | Handle the failure in processing natural language querying |
-
2017
- 2017-10-17 CN CN201710967726.6A patent/CN107885786B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103646032A (en) * | 2013-11-11 | 2014-03-19 | 漆桂林 | Database query method based on body and restricted natural language processing |
US20170097990A1 (en) * | 2014-03-03 | 2017-04-06 | Michael L. Hamm | Text-sql relational database |
CN105279168A (en) * | 2014-06-24 | 2016-01-27 | 华为技术有限公司 | Data query method supporting natural language, open platform, and user terminal |
CN107016012A (en) * | 2015-09-11 | 2017-08-04 | 谷歌公司 | Handle the failure in processing natural language querying |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020008187A1 (en) * | 2018-07-02 | 2020-01-09 | Babylon Partners Limited | A computer implemented method for extracting and reasoning with meaning from text |
US10846288B2 (en) | 2018-07-02 | 2020-11-24 | Babylon Partners Limited | Computer implemented method for extracting and reasoning with meaning from text |
CN112437917A (en) * | 2018-07-25 | 2021-03-02 | 甲骨文国际公司 | Natural language interface for databases using autonomous agents and thesaurus |
CN109947794A (en) * | 2019-02-21 | 2019-06-28 | 东华大学 | A kind of interactive natural language inquiry conversion method |
CN109947794B (en) * | 2019-02-21 | 2023-09-01 | 东华大学 | Interactive natural language query conversion method |
CN110119404B (en) * | 2019-04-12 | 2021-10-08 | 杭州量之智能科技有限公司 | Intelligent access system and method based on natural language understanding |
CN110119404A (en) * | 2019-04-12 | 2019-08-13 | 杭州量之智能科技有限公司 | A kind of intelligence access system and method based on natural language understanding |
CN110334179A (en) * | 2019-05-22 | 2019-10-15 | 深圳追一科技有限公司 | Question and answer processing method, device, computer equipment and storage medium |
CN110213651A (en) * | 2019-05-28 | 2019-09-06 | 暨南大学 | A kind of intelligent merit Computer Aided Analysis System and method based on security protection video |
CN110688394A (en) * | 2019-09-29 | 2020-01-14 | 浙江大学 | NL generation SQL method for novel power supply urban rail train big data operation and maintenance |
CN110688394B (en) * | 2019-09-29 | 2021-11-23 | 浙江大学 | NL generation SQL method for novel power supply urban rail train big data operation and maintenance |
CN113536741A (en) * | 2020-04-17 | 2021-10-22 | 复旦大学 | Method and device for converting Chinese natural language into database language |
CN113536741B (en) * | 2020-04-17 | 2022-10-14 | 复旦大学 | Method and device for converting Chinese natural language into database language |
CN111538854B (en) * | 2020-04-27 | 2023-08-08 | 北京百度网讯科技有限公司 | Searching method and device |
CN111538854A (en) * | 2020-04-27 | 2020-08-14 | 北京百度网讯科技有限公司 | Searching method and device |
CN112507098A (en) * | 2020-12-18 | 2021-03-16 | 北京百度网讯科技有限公司 | Question processing method, question processing device, electronic equipment, storage medium and program product |
CN112507098B (en) * | 2020-12-18 | 2022-01-28 | 北京百度网讯科技有限公司 | Question processing method, question processing device, electronic equipment, storage medium and program product |
CN114138817A (en) * | 2021-12-03 | 2022-03-04 | 中国建设银行股份有限公司 | Data query method, device, medium and product based on relational database |
CN114090627A (en) * | 2022-01-19 | 2022-02-25 | 支付宝(杭州)信息技术有限公司 | Data query method and device |
CN114090627B (en) * | 2022-01-19 | 2022-05-31 | 支付宝(杭州)信息技术有限公司 | Data query method and device |
Also Published As
Publication number | Publication date |
---|---|
CN107885786B (en) | 2021-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107885786A (en) | Towards the Natural Language Query Interface implementation method of big data | |
CN105868313B (en) | A kind of knowledge mapping question answering system and method based on template matching technique | |
Johnson | A data management strategy for transportable natural language interfaces | |
Wang et al. | Q2semantic: A lightweight keyword interface to semantic search | |
US9015150B2 (en) | Displaying results of keyword search over enterprise data | |
CN103646032B (en) | A kind of based on body with the data base query method of limited natural language processing | |
US5873079A (en) | Filtered index apparatus and method | |
US6167393A (en) | Heterogeneous record search apparatus and method | |
US9858292B1 (en) | Systems and methods for semantic icon encoding in data visualizations | |
CN110377715A (en) | Reasoning type accurate intelligent answering method based on legal knowledge map | |
US5884304A (en) | Alternate key index query apparatus and method | |
US5870739A (en) | Hybrid query apparatus and method | |
CN109522465A (en) | The semantic searching method and device of knowledge based map | |
CN109800284A (en) | A kind of unstructured information intelligent Answer System construction method of oriented mission | |
US8825621B2 (en) | Transformation of complex data source result sets to normalized sets for manipulation and presentation | |
CN107169033A (en) | Relation data enquiring and optimizing method with parallel framework is changed based on data pattern | |
CN107992608B (en) | SPARQL query statement automatic generation method based on keyword context | |
CN106446162A (en) | Orient field self body intelligence library article search method | |
US11487795B2 (en) | Template-based automatic software bug question and answer method | |
CN110119404B (en) | Intelligent access system and method based on natural language understanding | |
CN117608652A (en) | SQL sentence translation method based on high-level abstract syntax tree | |
CN113157875A (en) | Knowledge graph question-answering system, method and device | |
CN115878814A (en) | Knowledge graph question-answering method and system based on machine reading understanding | |
Almendros-Jiménez et al. | Flexible aggregation in FSA-SPARQL | |
Anisyah et al. | Natural language interface to database (NLIDB) for decision support queries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |