CN111897911B - Unstructured data query method and system based on secondary attribute graph - Google Patents

Unstructured data query method and system based on secondary attribute graph Download PDF

Info

Publication number
CN111897911B
CN111897911B CN202010529960.2A CN202010529960A CN111897911B CN 111897911 B CN111897911 B CN 111897911B CN 202010529960 A CN202010529960 A CN 202010529960A CN 111897911 B CN111897911 B CN 111897911B
Authority
CN
China
Prior art keywords
attribute
graph
query
primary
secondary attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010529960.2A
Other languages
Chinese (zh)
Other versions
CN111897911A (en
Inventor
沈志宏
赵子豪
周园春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN202010529960.2A priority Critical patent/CN111897911B/en
Publication of CN111897911A publication Critical patent/CN111897911A/en
Application granted granted Critical
Publication of CN111897911B publication Critical patent/CN111897911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for querying unstructured data based on a secondary attribute graph. The method comprises the following steps: 1) for a target database, taking unstructured data of each record in the database as a primary attribute of the corresponding record; 2) extracting intrinsic information in each primary attribute as a secondary attribute graph of the primary attribute; 3) expanding the query language of the target database, and adding a semantic operator "- >; expanding a query engine of the target database, and compiling and executing a query statement conforming to the syntax of the semantic operator "- >; 4) the query engine queries the cache results meeting the query conditions from the cache system according to the query conditions, if no matching result exists, the matching records in the target database are searched according to the primary attributes in the query conditions, then secondary attribute graphs are extracted from the primary attributes of the matching records and are respectively matched with the secondary attribute graphs in the query conditions, and the matching results are returned.

Description

Unstructured data query method and system based on secondary attribute graph
Technical Field
The invention relates to the technical fields of unstructured data, data query language, artificial intelligence, graph data models and the like, and provides a method and a system for realizing unstructured data representation and query based on a secondary attribute graph, aiming at the current situation that the prior art cannot conveniently perform information query on unstructured data and has weak information extraction and representation capability in unstructured data.
Background
The unstructured data has a large proportion in the network data, and contents such as pictures, sound recordings, videos, plain long texts and the like belong to the unstructured data. At present, the technology related to storage and query of structured data is mature, and the related solutions for storage and management of structured data are already well-established. However, with the progress of technology and the development of times, the data sources are wider and more extensive, the quantity is more and more, and the form is more and more complex. In many application scenarios, engineers need to face not only structured data with a canonical format, but also semi-structured data with a self-describing structure or even unstructured data without a fixed structure. Obviously, because of the flexibility of the structure, the data has rich expansibility and extremely high information expression freedom. But due to its freedom in format, the storage and management of such unstructured data has also been a problem that has plagued the industry for many years.
Current management and query techniques for unstructured data focus primarily on retrieval based on metadata of the unstructured data, such as file name, size, file category, tag value, etc. Such simple retrieval cannot fully utilize AI techniques and cannot directly query and consume information contained in unstructured data, which causes difficulties in querying and utilizing unstructured data. At present, some artificial intelligence methods can extract information in unstructured data, such as voice recording to character conversion, face recognition, license plate number extraction and the like, and related algorithms reach higher accuracy. However, because the AI algorithm is complex in dependence, difficult to deploy, and different tools have large differences, it is inconvenient to use the AI algorithm to obtain information in unstructured data.
In the face of the current situations that unstructured data are more and more, and the accuracy and richness of the AI algorithm are stronger and stronger, it is of great significance to develop a method and a system capable of quickly querying information in unstructured data.
Disclosure of Invention
The invention provides a method and a system for realizing unstructured data information representation and query based on a secondary attribute graph, aiming at the problem of unstructured data information query and representation, and the realization is based on a graph database. The method represents information in the unstructured data as a secondary attribute graph, extracts designated scene information by using an AI algorithm, describes scenes in the unstructured data by using the secondary attribute graph, and acquires the information in the unstructured data in a form of inquiring the secondary attribute graph, thereby realizing flexible representation and quick inquiry of the scene information in the unstructured data.
The technical scheme adopted by the invention is as follows:
a method for querying unstructured data based on a secondary attribute graph comprises the following steps:
1) for a target database, taking the unstructured data of each record in the target database as the primary attribute of the corresponding record;
2) extracting intrinsic information in each primary attribute, extracting nodes and attributes of the nodes from the intrinsic information to construct an attribute graph which is used as a secondary attribute graph of the primary attribute; wherein the secondary attribute graph represents nodes by "()", "{ }" represents attribute sets of nodes, and "- [ ] -" represents edges between nodes;
3) expanding the query language of the target database, setting symbols "()", "{ }", "- [ ] -" for describing the intrinsic information of the unstructured data, and setting a secondary attribute graph extraction symbol "- >", wherein the symbol "- >" is a binary connector, the left side is connected with a primary attribute, the right side is connected with the name of a secondary attribute graph, and the semantic operator "- >" is used by 'a- > b', and the meaning is that the name b of the secondary attribute graph in the primary attribute a is queried; expanding a query engine of the target database, and compiling and executing a query statement conforming to the syntax of the semantic operator "- >;
4) the query engine queries the cache results meeting the query conditions from the cache system according to the query conditions, and if the cache results are matched, the cache results are returned; if no matched query result exists, searching the matched record in the target database according to the primary attribute in the query condition, then extracting the secondary attribute graph from the primary attribute of the matched record, respectively matching with the secondary attribute graph in the query condition, and returning the matching result.
Further, the node contains category and attribute set information, and the edge contains category information.
Further, the query is carried out by directly inputting the primary attribute and the secondary attribute graph; or inputting the information in the secondary attribute graph to inquire the secondary attribute graph, and then inputting the primary attribute and inquiring the secondary attribute graph selected from the inquired secondary attribute graph.
Furthermore, an algorithm mapping library is established, and the corresponding relation between each AI algorithm and different secondary attribute maps is set, so as to call different AI algorithms to extract the corresponding secondary attribute maps in the primary attributes.
An unstructured data query system based on a secondary attribute graph is characterized by comprising an information extractor, a task scheduler and a query engine; wherein,
the information extractor is used for extracting the unstructured data of each record from the target database as the primary attribute of the corresponding record; calling a task scheduler to extract the intrinsic information in each primary attribute, and then extracting nodes and the attributes of the nodes from the intrinsic information to construct an attribute graph which is used as a secondary attribute graph of the primary attribute; wherein the secondary attribute graph represents nodes by "()", "{ }" represents attribute sets of nodes, and "- [ ] -" represents edges between nodes;
the task scheduler is used for calling different AI algorithms to extract different secondary attribute graphs from the intrinsic information of the primary attributes;
the query engine is used for querying the cache results meeting the query conditions from the cache system according to the query conditions, and if the cache results are matched, returning the cache results; if no matched query result exists, searching the matched record in the target database according to the primary attribute in the query condition, then extracting a secondary attribute graph from the primary attribute of the matched record, respectively matching with the secondary attribute graph in the query condition, and returning the matching result;
the query language of the target database is expanded, symbols "()", "{ }", "- [ ] -" are set for describing the intrinsic information of unstructured data, and a secondary attribute graph extraction symbol "- >", wherein the symbol "- >" is a binary connector, the left side is connected with a primary attribute, the right side is connected with the name of a secondary attribute graph, and the semantic operator "- >" is used by 'a- > b', meaning that the name b of the secondary attribute graph in the primary attribute a is queried; and expanding the query engine of the target database, and compiling and executing the query statement conforming to the syntax of the semantic operator "- >".
A method for querying unstructured data based on a secondary attribute graph comprises the following steps:
1) for a graph database based on an attribute graph model, wherein nodes in the graph database are used for representing entities, and edges are used for representing the relationship between the entities; taking the attribute data of each entity as the primary attribute of the corresponding node, extracting the intrinsic information in each primary attribute, then extracting the node and the attribute of the node from the intrinsic information to construct an attribute graph which is taken as a secondary attribute graph of the primary attribute; wherein the secondary attribute graph represents nodes by "()", "{ }" represents attribute sets of nodes, and "- [ ] -" represents edges between nodes;
2) expanding a Cypher query language of the graph database, setting symbols of "()", "{ }", "- [ ] -" for describing internal information of unstructured data, and setting a secondary attribute graph extraction symbol of "- >; the semantic operator "- >" is a binary operator, the left side is a primary attribute, the right side is a secondary attribute graph, and the meaning of the semantic operator is to extract the content of the secondary attribute graph in the primary attribute; expanding a Cypher query engine of the graph database, and analyzing query sentences input by a user in a syntax tree mode to generate an execution plan;
3) the Cypher query engine queries the cache results meeting the query conditions from the cache according to the query conditions, and if the cache results are matched, the cache results are returned; if no matched query result exists, searching the matched node in the graph database according to the primary attribute in the query condition, then extracting a secondary attribute graph from the primary attribute of the matched node, respectively matching with the secondary attribute graph in the query condition, and returning the matching result.
An unstructured data query system based on a secondary attribute graph is characterized by comprising an information extractor, a task scheduler and a Cypher query engine; wherein,
an information extractor for extracting unstructured data from each record of the graph database as a primary attribute of the corresponding record; calling a task scheduler to extract the intrinsic information in each primary attribute, and then extracting nodes and the attributes of the nodes from the intrinsic information to construct an attribute graph which is used as a secondary attribute graph of the primary attribute; wherein the secondary attribute graph represents nodes by "()", "{ }" represents attribute sets of nodes, and "- [ ] -" represents edges between nodes;
the task scheduler is used for calling different AI algorithms to extract different secondary attribute graphs from the intrinsic information of the primary attributes;
the Cypher query engine is used for querying the cache results meeting the query conditions from the cache according to the query conditions, and returning the cache results if the cache results are matched; if no matched query result exists, searching a matched node in the graph database according to the primary attribute in the query condition, then extracting a secondary attribute graph from the primary attribute of the matched node, respectively matching with the secondary attribute graphs in the query condition, and returning the matching result;
the Cypher query language of the graph database is expanded, symbols are arranged, namely, (), "{ } and" - [ ] - "are used for describing internal information of unstructured data, and a secondary attribute graph extraction symbol is arranged; the semantic operator "- >" is a binary operator, the left side is a primary attribute, the right side is a secondary attribute graph, and the meaning of the semantic operator is to extract the content of the secondary attribute graph in the primary attribute; and expanding a Cypher query engine of the graph database, and analyzing the query sentence input by the user in a syntax tree form to generate an execution plan.
The unstructured data information query method based on the secondary attribute graph comprises the following steps:
1) in the raw database, unstructured data is represented as attributes of database records (hereinafter referred to as primary attributes).
2) Some intrinsic information in the unstructured data (primary attributes) is defined as a secondary attribute graph. Information in the same primary attribute is represented in the form of a graph, such as: (person: "boy" } ] - [: SIT _ ON) - (: horse: "white" }).
3) On the basis of the step 2), expanding the query language of the database, increasing the description capacity of the internal information of the unstructured data, representing nodes by () and representing attribute sets by { } and sides by- [ ] -; the symbols "()", "{ }", "- [ ] -" belong to symbols in the graph data query language, and the graph structure is represented in the secondary attribute graph by using the symbols. Wherein, the node and the edge both contain categories, and the node also can contain attribute sets; the category can be freely set by the user and is used for marking entity categories, such as: person, Car, Article, category information is used to mark node category, narrow the category of search range node and edge. In particular, the present invention adds a secondary attribute map extraction symbol "- >", which is a binary connector, with the left side connected to a primary attribute and the right side connected to the name of the secondary attribute map. The name of the secondary attribute map can be freely specified by the user, as well as the secondary attribute name. Such as: photo- > locationGraph, which means that for the primary attribute photo (group photo), a secondary attribute map of the position relationship of the person in the group photo is obtained. The secondary property graph can be directly obtained through a query statement, such as: match (n: { name: "Alice" }) Return n.photo- > locationGraph. The information in the secondary attribute graph may also be queried, such as: match (n: { name: "Alice" }) With n.photo- > locationGraph as graph, Match (m) - [: nextTo ] - (n: { name: "Alice" }) from graph Return m.name.
4) On the basis of the step 3), a query engine of the expansion database is responsible for compiling and executing a query statement conforming to the syntax in the step 3), and the value of the secondary attribute graph is allowed to be acquired by adopting a mode of 'primary attribute- > attribute graph';
5) in the invention, the corresponding secondary attribute map information is obtained by calling a specific AI algorithm to process unstructured data. Each AI algorithm extracts a secondary attribute graph corresponding to the mode; one type of secondary attribute (e.g., "children's horse riding") corresponds to an AI algorithm, and the correspondence between the algorithm and the attribute map is maintained by an algorithm mapping library.
6) The function of the algorithm mapping library mentioned in the step 5) is to configure a specified algorithm for a specified primary attribute, and the algorithm can extract secondary attribute map information from the primary attribute. The algorithm mapping library is responsible for maintaining the mapping relationship between the algorithm and the secondary attribute map mode.
7) In order to accelerate the query in the step 4), the invention designs a cache system, the result is preferably searched in the cache system in each query, if the cache system has the latest result for the query, the AI algorithm is not called, and the result is directly returned. If the cache system does not have the corresponding result, calling an AI algorithm to obtain the secondary attribute, and storing the result in the cache system for accelerating subsequent query.
In particular, the invention provides a graph-database-based implementation of the above method:
1. in a graph database based on an attribute graph model, data is organized in the form of nodes and edges. Wherein, the nodes are used for representing natural entities (such as people, commodities, organizations and the like), and the edges are used for representing the relationships among the entities (such as friend relationships, purchasing relationships and the like). On the basis of a graph database, the invention improves the query language, executes an engine and increases an algorithm mapping library, so that the system supports the query of information in unstructured data through an attribute graph. The system architecture is shown in figure 1, and the main components include: a cache layer, a graph database (graph system), and an algorithm mapping library (AI system).
2. The attribute data is used to extend information describing the entity (e.g., name of the person, date of birth, certificate photo of the person, car photo of the person), and in particular, the present invention supports unstructured data as attribute data of the entity and is referred to as "primary attribute". Certain specific information in the unstructured data is defined as a certain secondary attribute map. (e.g., a photograph of a boy horse riding, as [: person { type: "boy" } ] - (: SIT _ ON) - [: horse { color: "white" } ])
3. In the invention, the corresponding secondary attribute graph is obtained by calling a specific AI algorithm to process unstructured data. A class of secondary attribute maps (e.g., children's horse-riding) corresponds to an AI algorithm, and the correspondence between the algorithm and the secondary attributes is maintained by an algorithm mapping library.
4. The invention realizes a cache layer for accelerating the query of the secondary attribute graph. The data stored in the cache layer is the query result of the secondary attribute map in a certain time period, and when the data and the AI algorithm are not changed, the AI algorithm is not repeatedly called for multiple queries of the same secondary attribute map.
5. The invention expands Cypher query language to support semantic extraction symbol "- >", wherein the symbol is a binary operator, the left side is a primary attribute, the right side is a secondary attribute graph, and the meaning of the symbol is that the content of the secondary attribute graph in the primary attribute is extracted.
6. On the basis of step 5, the Cypher query engine is expanded. The engine parses a query statement input by a user in the form of a syntax tree and generates an execution plan. When the query statement of the secondary attribute graph is executed, searching is preferentially carried out in the cache layer in the step 4, if the result is hit, the result is returned, and AI algorithm is not called for repeated processing; if the result is not hit, an AI algorithm is called to process the primary attribute to obtain the secondary attribute graph, the secondary attribute graph is returned to the user, and the result is stored in a cache layer and used for accelerating the next query.
7. For the cache layer in step 4, the data in the cache layer is stored in a form of key-value pairs, where a key is a combination of id of the unstructured data (primary attribute) and algorithm id, and the value is a result obtained by the AI algorithm processing the unstructured data. When the value of the algorithm or the primary attribute is updated, the value of the combination id is also changed, which can make the original cache result out of date, and the design enables the system to obtain the latest secondary attribute map.
8. For the algorithm mapping described in step 3, the present invention implements an algorithm mapping library. The function of the algorithm mapping library is to manage and maintain the corresponding relationship between the secondary attribute graph mode and the AI algorithm, receive the request of the execution engine calling the AI algorithm to process the unstructured data in step 6, process the data, and return the result.
9. In order to improve efficiency, query engines of the algorithm mapping library and the graph database are deployed on different hosts, and data is interacted between the two hosts through an HTTP protocol.
The invention has the beneficial effects that:
the invention provides a novel method for representing and querying unstructured data information. The invention provides a concept of a secondary attribute graph on the basis of a database model, and information in unstructured data is represented as the secondary attribute graph. And the secondary attribute map mode is mapped to the AI algorithm. The method and the device realize the query of the unstructured data information through the database query language, simplify the flow of calling the AI algorithm to extract information from the unstructured data, and enhance the flexibility of the query of the unstructured data information. The information extraction capability of the AI algorithm and the information query capability of the database are fully combined, and a new solution is provided for the information query of the unstructured data.
The design of the cache layer reduces the calling times of the AI algorithm when the same secondary attribute is repeatedly inquired, and improves the inquiry efficiency of the system.
The design of separating the algorithm mapping library (AI system) from the graph database shields the complexity of algorithm dependence and improves the utilization efficiency of system resources.
Drawings
FIG. 1 is a system framework diagram of the present invention.
Detailed Description
The invention is further described by the following specific embodiments in conjunction with the accompanying drawings.
Some academic map contains data such as academic conference information, student information, and scientific research institution information. The map takes the scholars, meetings and institutions as vertexes, and the relationships of participants, affiliations and the like as edges. Wherein, there is the group photo of academic meeting under the conference node.
The user obtains a certain group photo by a query statement, then obtains the information of the position relationship of the people in the group photo according to the secondary attribute graph (Match (Meeting) Return n.photo- > locationGraph), or more closely, directly searches the position relationship of the people in the group photo as an attribute graph to obtain the information in the secondary attribute graph. As a query statement to obtain the names of people next to Bob in the academic meeting group: ("Match (n: Meeting) with n.photo- > locationGraph as graph, Match (m1) in graph Where (m1) - [: nextTo ] - (m2{ name:" Bob "}) return m1. name").
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and a person skilled in the art can make modifications or equivalent substitutions to the technical solution of the present invention without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (10)

1. A method for querying unstructured data based on a secondary attribute graph comprises the following steps:
1) for a target database, taking the unstructured data of each record in the target database as the primary attribute of the corresponding record;
2) extracting intrinsic information in each primary attribute, extracting nodes and attributes of the nodes from the intrinsic information to construct an attribute graph which is used as a secondary attribute graph of the primary attribute; wherein the secondary attribute graph represents nodes by "()", "{ }" represents attribute sets of nodes, and "- [ ] -" represents edges between nodes;
3) expanding the query language of the target database, setting symbols "()", "{ }", "- [ ] -" for describing the intrinsic information of the unstructured data, and setting a secondary attribute graph extraction symbol "- >", wherein the symbol "- >" is a binary connector, the left side is connected with a primary attribute, the right side is connected with the name of a secondary attribute graph, and the semantic operator "- >" is used by 'a- > b', and the meaning is that the name b of the secondary attribute graph in the primary attribute a is queried; expanding a query engine of the target database, and compiling and executing a query statement conforming to the syntax of the semantic operator "- >;
4) the query engine queries the cache results meeting the query conditions from the cache system according to the query conditions, and if the cache results are matched, the cache results are returned; if no matched query result exists, searching the matched record in the target database according to the primary attribute in the query condition, then extracting the secondary attribute graph from the primary attribute of the matched record, respectively matching with the secondary attribute graph in the query condition, and returning the matching result.
2. The method of claim 1, wherein a node contains category and attribute set information and an edge contains category information.
3. The method of claim 1 or 2, wherein the query is made by directly inputting the primary attribute and the secondary attribute maps; or inputting the information in the secondary attribute graph to inquire the secondary attribute graph, and then inputting the primary attribute and inquiring the secondary attribute graph selected from the inquired secondary attribute graph.
4. The method of claim 1, wherein an algorithm mapping library is established, and the correspondence between each AI algorithm and different secondary attribute maps is set for invoking different AI algorithms to extract the corresponding secondary attribute maps in the primary attributes.
5. An unstructured data query system based on a secondary attribute graph is characterized by comprising an information extractor, a task scheduler and a query engine; wherein,
the information extractor is used for extracting the unstructured data of each record from the target database as the primary attribute of the corresponding record; calling a task scheduler to extract the intrinsic information in each primary attribute, and then extracting nodes and the attributes of the nodes from the intrinsic information to construct an attribute graph which is used as a secondary attribute graph of the primary attribute; wherein the secondary attribute graph represents nodes by "()", "{ }" represents attribute sets of nodes, and "- [ ] -" represents edges between nodes;
the task scheduler is used for calling different AI algorithms to extract different secondary attribute graphs from the intrinsic information of the primary attributes;
the query engine is used for querying the cache results meeting the query conditions from the cache system according to the query conditions, and if the cache results are matched, returning the cache results; if no matched query result exists, searching the matched record in the target database according to the primary attribute in the query condition, then extracting a secondary attribute graph from the primary attribute of the matched record, respectively matching with the secondary attribute graph in the query condition, and returning the matching result;
the query language of the target database is expanded, symbols "()", "{ }", "- [ ] -" are set for describing the intrinsic information of unstructured data, and a secondary attribute graph extraction symbol "- >", wherein the symbol "- >" is a binary connector, the left side is connected with a primary attribute, the right side is connected with the name of a secondary attribute graph, and the semantic operator "- >" is used by 'a- > b', meaning that the name b of the secondary attribute graph in the primary attribute a is queried; and expanding the query engine of the target database, and compiling and executing the query statement conforming to the syntax of the semantic operator "- >".
6. The system of claim 5, wherein a node contains category and attribute set information and an edge contains category information.
7. The system of claim 5 or 6, wherein the query is made by directly inputting the primary attribute and the secondary attribute maps; or inputting the information in the secondary attribute graph to inquire the secondary attribute graph, and then inputting the primary attribute and inquiring the secondary attribute graph selected from the inquired secondary attribute graph.
8. A method for querying unstructured data based on a secondary attribute graph comprises the following steps:
1) for a graph database based on an attribute graph model, wherein nodes in the graph database are used for representing entities, and edges are used for representing the relationship between the entities; taking the attribute data of each entity as the primary attribute of the corresponding node, extracting the intrinsic information in each primary attribute, then extracting the node and the attribute of the node from the intrinsic information to construct an attribute graph which is taken as a secondary attribute graph of the primary attribute; wherein the secondary attribute graph represents nodes by "()", "{ }" represents attribute sets of nodes, and "- [ ] -" represents edges between nodes;
2) expanding a Cypher query language of the graph database, setting symbols of "()", "{ }", "- [ ] -" for describing internal information of unstructured data, and setting a secondary attribute graph extraction symbol of "- >; the semantic operator "- >" is a binary operator, the left side is a primary attribute, the right side is a secondary attribute graph, and the meaning of the semantic operator is to extract the content of the secondary attribute graph in the primary attribute; expanding a Cypher query engine of the graph database, and analyzing query sentences input by a user in a syntax tree mode to generate an execution plan;
3) the Cypher query engine queries the cache results meeting the query conditions from the cache according to the query conditions, and if the cache results are matched, the cache results are returned; if no matched query result exists, searching the matched node in the graph database according to the primary attribute in the query condition, then extracting a secondary attribute graph from the primary attribute of the matched node, respectively matching with the secondary attribute graph in the query condition, and returning the matching result.
9. The method of claim 8, wherein an algorithm mapping library is established, and the correspondence between each AI algorithm and different secondary attribute maps is set for invoking different AI algorithms to extract the corresponding secondary attribute maps in the primary attributes.
10. An unstructured data query system based on a secondary attribute graph is characterized by comprising an information extractor, a task scheduler and a Cypher query engine; wherein,
an information extractor for extracting unstructured data from each record of the graph database as a primary attribute of the corresponding record; calling a task scheduler to extract the intrinsic information in each primary attribute, and then extracting nodes and the attributes of the nodes from the intrinsic information to construct an attribute graph which is used as a secondary attribute graph of the primary attribute; wherein the secondary attribute graph represents nodes by "()", "{ }" represents attribute sets of nodes, and "- [ ] -" represents edges between nodes;
the task scheduler is used for calling different AI algorithms to extract different secondary attribute graphs from the intrinsic information of the primary attributes;
the Cypher query engine is used for querying the cache results meeting the query conditions from the cache according to the query conditions, and returning the cache results if the cache results are matched; if no matched query result exists, searching a matched node in the graph database according to the primary attribute in the query condition, then extracting a secondary attribute graph from the primary attribute of the matched node, respectively matching with the secondary attribute graphs in the query condition, and returning the matching result;
the Cypher query language of the graph database is expanded, symbols are arranged, namely, (), "{ } and" - [ ] - "are used for describing internal information of unstructured data, and a secondary attribute graph extraction symbol is arranged; the semantic operator "- >" is a binary operator, the left side is a primary attribute, the right side is a secondary attribute graph, and the meaning of the semantic operator is to extract the content of the secondary attribute graph in the primary attribute; and expanding a Cypher query engine of the graph database, and analyzing the query sentence input by the user in a syntax tree form to generate an execution plan.
CN202010529960.2A 2020-06-11 2020-06-11 Unstructured data query method and system based on secondary attribute graph Active CN111897911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010529960.2A CN111897911B (en) 2020-06-11 2020-06-11 Unstructured data query method and system based on secondary attribute graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010529960.2A CN111897911B (en) 2020-06-11 2020-06-11 Unstructured data query method and system based on secondary attribute graph

Publications (2)

Publication Number Publication Date
CN111897911A CN111897911A (en) 2020-11-06
CN111897911B true CN111897911B (en) 2021-08-31

Family

ID=73206260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010529960.2A Active CN111897911B (en) 2020-06-11 2020-06-11 Unstructured data query method and system based on secondary attribute graph

Country Status (1)

Country Link
CN (1) CN111897911B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113918305B (en) * 2021-10-29 2024-06-25 平安银行股份有限公司 Node scheduling method, node scheduling device, electronic equipment and readable storage medium
CN116150437B (en) * 2023-04-12 2023-09-26 阿里巴巴(中国)有限公司 Graph query method

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101305366B (en) * 2005-11-29 2013-02-06 国际商业机器公司 Method and system for extracting and visualizing graph-structured relations from unstructured text
CN102541521B (en) * 2010-12-17 2015-03-25 中国银联股份有限公司 Automatic operating instruction generating device based on structured query language and method
CN102289482A (en) * 2011-08-02 2011-12-21 北京航空航天大学 Unstructured data query method
CN104331517A (en) * 2014-11-26 2015-02-04 北京优特捷信息技术有限公司 Retrieval method and retrieval device
US9569733B2 (en) * 2015-02-20 2017-02-14 International Business Machines Corporation Extracting complex entities and relationships from unstructured data
CN106156135A (en) * 2015-04-10 2016-11-23 华为技术有限公司 The method and device of inquiry data
US10313365B2 (en) * 2016-08-15 2019-06-04 International Business Machines Corporation Cognitive offense analysis using enriched graphs
CN108268600B (en) * 2017-12-20 2020-09-08 北京邮电大学 AI-based unstructured data management method and device
CN108470040B (en) * 2018-02-11 2021-03-09 中国石油天然气股份有限公司 Method and device for warehousing unstructured data
CN109241080B (en) * 2018-09-29 2020-09-29 焦点科技股份有限公司 Construction and use method and system of FQL query language
CN109582831B (en) * 2018-10-16 2022-02-01 中国科学院计算机网络信息中心 Graph database management system supporting unstructured data storage and query
CN109597919B (en) * 2018-10-18 2021-11-09 中国科学院计算机网络信息中心 Data management method and system fusing graph database and artificial intelligence algorithm
US10572522B1 (en) * 2018-12-21 2020-02-25 Impira Inc. Database for unstructured data
CN110046236B (en) * 2019-03-20 2022-12-20 腾讯科技(深圳)有限公司 Unstructured data retrieval method and device
CN110688544A (en) * 2019-10-17 2020-01-14 北京锐安科技有限公司 Method, device and storage medium for querying database
CN110704698B (en) * 2019-12-13 2020-04-10 中国人民解放军国防科技大学 Correlation and query method for unstructured massive network security data

Also Published As

Publication number Publication date
CN111897911A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN109284363B (en) Question answering method and device, electronic equipment and storage medium
CN110941612B (en) Autonomous data lake construction system and method based on associated data
CN107391677B (en) Method and device for generating Chinese general knowledge graph with entity relation attributes
CN109325040B (en) FAQ question-answer library generalization method, device and equipment
CN106570081A (en) Semantic net based large scale offline data analysis framework
CN114218472A (en) Intelligent search system based on knowledge graph
CN114218400A (en) Semantic-based data lake query system and method
CN111651447B (en) Intelligent construction life-span data processing, analyzing and controlling system
CN109189959A (en) A kind of method and device constructing image data base
CN111897911B (en) Unstructured data query method and system based on secondary attribute graph
CN112818092B (en) Knowledge graph query statement generation method, device, equipment and storage medium
CN105631749A (en) User portrait calculation method based on statistical data
Debattista et al. Linked'Big'Data: towards a manifold increase in big data value and veracity
Sarma et al. Data modeling in dataspace support platforms
CN107193882A (en) Why not query answer methods based on figure matching on RDF data
Sarma et al. Uncertainty in data integration and dataspace support platforms
CN118643134A (en) Retrieval enhancement generation system and method based on knowledge graph
CN110532358A (en) A kind of template automatic generation method towards knowledge base question and answer
CN113094449A (en) Large-scale knowledge map storage scheme based on distributed key value library
CN112015908A (en) Knowledge graph construction method and system, and query method and system
CN111831787B (en) Unstructured data information query method and system based on secondary attributes
CN107807977A (en) A kind of object properties Metadata Extraction system based on configuration
CN108241709A (en) A kind of data integrating method, device and system
Kolas et al. Spatially-augmented knowledgebase
Lee et al. LifeLogOn: A practical lifelog system for building and exploiting lifelog ontology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant