CN115357682A - Semantic information query method and device for multi-version unstructured data - Google Patents

Semantic information query method and device for multi-version unstructured data Download PDF

Info

Publication number
CN115357682A
CN115357682A CN202210871371.1A CN202210871371A CN115357682A CN 115357682 A CN115357682 A CN 115357682A CN 202210871371 A CN202210871371 A CN 202210871371A CN 115357682 A CN115357682 A CN 115357682A
Authority
CN
China
Prior art keywords
semantic information
unstructured data
node
version
data object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210871371.1A
Other languages
Chinese (zh)
Inventor
沈志宏
赵子豪
路长发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN202210871371.1A priority Critical patent/CN115357682A/en
Publication of CN115357682A publication Critical patent/CN115357682A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a semantic information query method and a semantic information query device for multi-version unstructured data. The method comprises the following steps: acquiring a query statement; parsing the query statement into an abstract syntax tree; running the node search operator at the node search node to obtain a specified object attribute calculation node; obtaining the version number of the unstructured data object and the semantic information name based on the query statement at the version extraction node; calculating the attribute value of the unstructured data object at a designated object attribute calculation node according to the version number of the unstructured data object and the unstructured data object; and obtaining a semantic information query result at a semantic information computing node based on the attribute value of the unstructured data object, the semantic information name and the version number of the semantic information name. The invention can support the query of multi-version unstructured data semantic information.

Description

Multi-version unstructured data oriented semantic information query method and device
Technical Field
The invention relates to the fields of unstructured data, artificial intelligence, query language, databases and the like, and aims to provide a semantic information query method and a semantic information query device for multi-version unstructured data.
Background
Unstructured data broadly refers to data that is not structured, such as long text, pictures, video, audio, and the like. Unstructured data is usually stored in a computer system in the form of a string of binary codes, and the common feature of such data is that the data is bulky and is not understandable by the computer. The query demand of users on unstructured data mainly focuses on the query of information in unstructured data, and the information should be semantic and understandable information, and the information is called semantic information. The traditional data query technology cannot query the semantic information of the unstructured data, but the progress of the artificial intelligence technology makes the analysis of the unstructured data and the development of the query technology have a new direction. At present, the artificial intelligence technology can realize tasks of face recognition, object recognition, voice recognition, emotion analysis and the like with higher accuracy. Therefore, the information in the unstructured data can be acquired through the artificial intelligence technology, and the development of the unstructured data query technology is further promoted.
Semantic information queries of unstructured data are essentially queries under certain rules for certain states of certain objects. Unstructured data is essentially a description of some state of an object, and when the state of the object changes, the content of the unstructured data also changes, which can be regarded as a version change of the unstructured data object. Meanwhile, the change of the semantic information extraction rule can also cause the version change of the semantic information. The prior art can only inquire one or more semantic information of one unstructured data object, but cannot realize the inquiry of the unstructured data object and the semantic information of a specified version.
In addition, incidence relations generally exist among entities in the natural world, and the graph model models data in the incidence relation mode, so that the relation among the entities can be highlighted. Therefore, the graph model is also widely applied to management and query of unstructured data. However, the existing graph data query languages do not support the version query of the unstructured data objects and semantic information on the syntax level. .
Disclosure of Invention
Aiming at the problems, the invention realizes a semantic information query method and a device for multi-version unstructured data by expanding syntax representation, modifying analysis of query sentences and optimizing and executing logic of a query plan on the basis of the existing graph data query language, and can support query of multi-version unstructured data semantic information.
The technical content of the invention comprises:
a semantic information query method oriented to multi-version unstructured data, the method comprising:
acquiring a query statement; wherein the query statement comprises: searching operators and unstructured data objects and semantic information names which are connected through a first connector by the nodes;
parsing the query statement into an abstract syntax tree; wherein the abstract syntax tree comprises: the system comprises a node searching node, a version extracting node, an object attribute calculating node and a semantic information calculating node;
running the node search operator at the node search node to obtain a designated object attribute calculation node;
obtaining the version number of the unstructured data object and the semantic information name based on the query statement at the version extraction node;
calculating the attribute value of the unstructured data object at a designated object attribute calculation node according to the version number of the unstructured data object and the unstructured data object;
and obtaining a semantic information query result at a semantic information computing node based on the attribute value of the unstructured data object, the semantic information name and the version number of the semantic information name.
Further, the obtaining, at the version extraction node, the version number of the unstructured data object and the semantic information name based on the query statement includes:
reading the query statement;
when the unstructured data object or the semantic information name is read, judging whether a second connector follows the unstructured data object or the semantic information name;
if the second connector is followed, reading all characters between the designated symbols to obtain the version number of the unstructured data object or the semantic information name;
and if the second connector is not followed, taking the version number of the latest version of the unstructured data object or the latest version of the semantic information name as the version number of the unstructured data object or the semantic information name.
Further, the designated symbol includes: and (4) sharp brackets.
Furthermore, the query statement is obtained by expanding based on Cypher query language.
Further, the unstructured data objects include: nodes and relationships.
Further, the version number of the semantic information name is generated by the following steps:
acquiring semantic information corresponding to the semantic information name;
acquiring the version number of the artificial intelligence model for extracting the semantic information;
and taking the version number of the artificial intelligence model as the version number of the semantic information name.
A semantic information query device facing multi-version unstructured data comprises:
a query statement acquisition module for acquiring a query statement; wherein the query statement comprises: searching operators and unstructured data objects and semantic information names which are connected through a first connector by the nodes;
the query statement analysis module is used for analyzing the query statement into an abstract syntax tree; wherein the abstract syntax tree comprises: the system comprises a node searching node, a version extracting node, an object attribute calculating node and a semantic information calculating node; running the node search operator at the node search node to obtain a specified object attribute calculation node; obtaining the version number of the unstructured data object and the semantic information name based on the query statement at the version extraction node; calculating the attribute value of the unstructured data object at a specified object attribute calculation node according to the version number of the unstructured data object and the unstructured data object; and obtaining a semantic information query result at a semantic information computing node based on the attribute value of the unstructured data object, the semantic information name and the version number of the semantic information name.
A storage medium having a computer program stored therein, wherein the computer program is configured to execute the above semantic information query method for multi-version unstructured data when running.
A computer device comprising a memory in which a computer program is stored and a processor configured to run the computer program to perform the above-mentioned semantic information query method for multi-version unstructured data.
A computer program product, when running on a computer device, causes the computer device to execute the above semantic information query method for multi-version unstructured data.
Compared with the prior art, the method provided by the invention has the following advantages and effects:
1. the query language provided by the invention supports the expression of the version of the unstructured data and the version of semantic information in the unstructured data at a syntax level.
2. Based on the query language, the invention can acquire the semantic information of the unstructured data object of the corresponding version from the semantic information of the unstructured data of the multiple versions.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 illustrates the AST when only unstructured data object versions are specified.
Fig. 3 shows an AST diagram when only semantic information versions are specified.
Figure 4 shows an AST diagram when both unstructured data objects and versions of semantic information are specified.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is to be understood that the described embodiments are merely specific embodiments of the present invention, rather than all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The invention relates to a semantic information query method facing multi-version unstructured data, which comprises the following steps as shown in figure 1:
step 1: a query statement is obtained.
The query statement of the invention is expanded based on Cypher query language, and comprises a node search operator, an unstructured data object and a semantic information name which are connected through a first connector, so that the query statement supports versioned query of the unstructured data semantic information.
The query statement of the present invention, in addition to containing the conventional node lookup operator, also uses- > notation to represent the semantic information of an unstructured data object. The symbol is a binary connector with the left side of the unstructured data object and the right side of the name of the semantic information.
In one example, when querying a particular version of semantic information, the @ notation is used to represent the version information of the object. The @ symbol is a binary connector with a "versionable" object on the left and a version number for the object on the right. "versionable" objects include: nodes, relationships, attributes (including structured and unstructured data), semantic information.
Wherein, the expression rule of the version number is as follows: < $ version >, which must be included in a pair of sharp brackets. The specific content of the version number may be a character string or a number.
In another example, when the AI model of the latest version of the semantic information is queried to obtain the semantic information, the @ symbol may not be used, i.e., the version number is not indicated.
In addition, the process of acquiring the version number of the semantic information comprises the following steps:
the invention obtains the semantic information content of unstructured data by using an artificial intelligence model, wherein semantic information corresponds to a class of artificial intelligence models, for example, semantic information face corresponds to a human face feature extraction model (the human face feature extraction model is a known technology).
Definition of semantic information extraction model version: a series of models for extracting certain semantic information are treated as different versions of the model. Such as: the model which is used for extracting face semantic information in the system is regarded as a v-m1 version at first, and then the model is regarded as a v-m2 version because a model with higher precision is replaced by an upgrading model.
Definition of semantic information version: and the artificial intelligence model is in one-to-one correspondence with the version number of the semantic information. That is, for the same unstructured data object, the semantic information extracted by the artificial intelligence models of different versions is regarded as the semantic information of different versions. Such as: the system is provided with two face feature extraction models, the version numbers of the two face feature extraction models are v-m1 and v-m2 respectively, and the two face feature extraction models are used for extracting semantic information of face. Then the face extracted by the v-m1 model is taken as the v1 version, and the face extracted by the v-m2 model is taken as the v2 version.
Step 2: the query statement is parsed into an abstract syntax tree.
The present invention parses the entire Query statement into an Abstract syntax Tree (Abstract Semantic Tree, opengraph/front-end: matching, AST and Semantic analysis for the Cygraph Query Language), and the entire Query statement is represented as a Tree. The data operations in the query statement correspond to one or more nodes on the tree. The abstract syntax tree at least comprises a node searching node, a version extracting node, an object attribute calculating node and a semantic information calculating node.
And step 3: and running a node search operator at the node search node to obtain the designated object attribute calculation node.
The invention firstly runs the node search operator MATCH (n) WHERE id (n) in the query statement, and obtains the appointed object attribute calculation node for subsequent operation.
And 4, step 4: and obtaining the version number of the unstructured data object and the semantic information name at the version extraction node based on the query statement.
The version abstraction node may be denoted PPTVersionNumber (plainNum) for representing version numbers in an Abstract Semantic Tree. The value of plainNum is the value of < $ version > in the query statement.
When the "@" symbol is read, judging whether the first character behind the "@" symbol is "<", if not, calling the latest version corresponding to the current system by default. If so, all characters from here to the first ">" character thereafter are read and taken as version numbers, i.e., values of the plainNum in PPTVionNumber (plainNum).
In one example, when the parser reads a specific unstructured attribute, it determines whether it is followed by the "@" symbol, and if not, it defaults to the latest version of the unstructured attribute in the current system. If yes, calling PPTVionNumber (PlainNum) to analyze to obtain the version number $ p-VersionNumber of the object.
When the grammar resolver reads semantic information of a designated unstructured data, whether the semantic information follows the "@" symbol or not is judged, if not, the AI model corresponding to the latest version of the semantic information in the current system is called by default to obtain the semantic information. If yes, calling PPTVionNumber (PlainNum) to analyze to obtain the version number $ S-VersionNumber of the semantic information.
And 5: and calculating the attribute value of the unstructured data object at the designated object attribute calculation node according to the version number of the unstructured data object and the unstructured data object.
The object property calculation node can be represented as PPTPPropertyAtVersion ($ Property, $ p-VersionNumber), which represents that a specific version of a certain property is obtained, and the return value of the object property calculation node is the value of the certain property under the specified version. Wherein property is an object with properties, such as nodes and relationships; the p-version number is the value of PPTVersionNumber resulting from parsing from the query statement.
Step 6: and obtaining a semantic information query result at a semantic information computing node based on the attribute value of the unstructured data object, the semantic information name and the version number of the semantic information name.
The semantic information calculation node may be represented as PPTCustomPropertyAtVersion ($ hpoperty, $ sub-property, $ s-versionNumber), and represents that semantic information of a specific version of one attribute (unstructured data) is acquired, and the return value thereof is the value of the semantic information of a specified version of one attribute (unstructured data). Wherein $ hpoperty is the return value of pppppppppropertyaversion ($ property, $ p-versionNumber), $ sub-property is the name of the semantic information, and $ s-versionNumber is the value of pptpversionnumber obtained from the query statement parsing.
Several specific embodiments of the invention are listed below:
specific example 1:
suppose that a user needs to query the face semantic information of the blob attribute with the version number of v-b4 of a node with id 1. That is, in fig. 1, the surface semantic information of the unstructured data with the version number v-b4 is: match (n) Where id (n) =1Return n. Blob @ < v-b4> - > face;
after the system receives the query request, the query statement is analyzed according to the method defined in the invention content 1, MATCH (n) WHERE id (n) =1 is firstly analyzed into findnebsyid (1), and the operator has the meaning of searching for the node with id of 1. The remaining query statements are then parsed as described in inventive content 1-11) and 12), and when the parser parses to @ symbol, it starts reading the content in the following < > symbol and takes v-b4 as the version number of Blob. An Abstract Semantic Tree obtained by parsing the query statement is shown in FIG. 1. The user does not specify the version of the semantic information and therefore defaults to the latest (latest) version.
Specific example 2:
assuming that a user needs to detect the face semantic information of the Blob attribute of a node with id 1 and requires that the version number of the face information is v-sp4, the query statement is: "MATCH (n) WHERE id (n) =1RETURN n.Blob- > face @ < v-sp4>; "Abstract Semantic Tree obtained by parsing this query statement is shown in FIG. 2. The user does not specify the version of the Blob, so the latest (latest) version is taken by default
Specific example 3:
suppose that a user needs to detect the face semantic information of the Blob attribute of a node with id 1, and requires that the version number of Blob be v-b4 and the version number of face be v-sp4. The query statement is: "MATCH (n) WHERE id (n) =1
RETURN n. Blob @ < v-b4> - > face @ < v-sp4>; ", the Abstract Semantic Tree parsed by the query statement is shown in FIG. 3. When PPTPCustomPertyAtVersion (n.blob @ < v-b4>, face, v-sp 4) is executed, a model with the version v-m4 is called to process a return result of the PPTPPropertyAtVersion (namely an unstructured data object with the version number v-b 4), and a final result is obtained.
In summary, the present invention first analyzes the query statement, and reorganizes and sorts the query conditions in the query statement. And obtaining the PPTVVisionNumber of a certain unstructured attribute, and if the value of the PPTVionNumber is v-p1, then the v-p1 designates the unstructured data object of the version to be searched for by the user. And obtaining the PPTVionNumber of certain semantic information according to the query statement, and if the value of the PPTVionNumber is v-sp1, the v-sp1 designates the semantic information of the version to be retrieved for the user. Next, v-p1 is used as an input of PPTPropertyAtVersion () and an unstructured attribute meeting the screening condition is searched in the database in combination with other screening conditions (such as attribute values) and is denoted as b1. And finally, taking the b1, the semantic information name and the v-sp1 as input of PPTCUstmPROPERTY ATVersion, calling an artificial intelligence model which corresponds to the semantic information name and has the version number corresponding to the v-sp1, and extracting the semantic information of the corresponding version from the unstructured data b1.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (10)

1. A semantic information query method oriented to multi-version unstructured data, the method comprising:
acquiring a query statement; wherein the query statement comprises: searching operators and unstructured data objects and semantic information names which are connected through a first connector by the nodes;
parsing the query statement into an abstract syntax tree; wherein the abstract syntax tree comprises: the system comprises a node searching node, a version extracting node, an object attribute calculating node and a semantic information calculating node;
running the node search operator at the node search node to obtain a specified object attribute calculation node;
obtaining the version number of the unstructured data object and the semantic information name based on the query statement at the version extraction node;
calculating the attribute value of the unstructured data object at a designated object attribute calculation node according to the version number of the unstructured data object and the unstructured data object;
and obtaining a semantic information query result at a semantic information computing node based on the attribute value of the unstructured data object, the semantic information name and the version number of the semantic information name.
2. The method of claim 1, wherein the obtaining at the version extraction node a version number of the unstructured data object and the semantic information name based on the query statement comprises:
reading the query statement;
when the unstructured data object or the semantic information name is read, judging whether a second connector follows the unstructured data object or the semantic information name;
if the second connector is followed, reading all characters between the designated symbols to obtain the version number of the unstructured data object or the semantic information name;
and if the second connector is not followed, taking the version number of the latest version of the unstructured data object or the latest version of the semantic information name as the version number of the unstructured data object or the semantic information name.
3. The method of claim 2, wherein the specifying symbols comprises: and (4) sharp brackets.
4. The method of claim 1, wherein the query statement is expanded based on the Cypher query language.
5. The method of claim 1, wherein the unstructured data objects comprise: nodes and relationships.
6. The method of claim 1, wherein the version number of the semantic information name is generated by:
acquiring semantic information corresponding to the semantic information name;
acquiring the version number of the artificial intelligence model for extracting the semantic information;
and taking the version number of the artificial intelligence model as the version number of the semantic information name.
7. A semantic information query device facing multi-version unstructured data comprises:
a query statement acquisition module for acquiring a query statement; wherein the query statement comprises: searching operators and unstructured data objects and semantic information names which are connected through a first connector by the nodes;
the query statement analysis module is used for analyzing the query statement into an abstract syntax tree; wherein the abstract syntax tree comprises: the system comprises a node searching node, a version extracting node, an object attribute calculating node and a semantic information calculating node; running the node search operator at the node search node to obtain a designated object attribute calculation node; obtaining the version number of the unstructured data object and the semantic information name based on the query statement at the version extraction node; calculating the attribute value of the unstructured data object at a designated object attribute calculation node according to the version number of the unstructured data object and the unstructured data object; and obtaining a semantic information query result at a semantic information computing node based on the attribute value of the unstructured data object, the semantic information name and the version number of the semantic information name.
8. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when executed, perform any of the methods of claims 1-6.
9. A computer device comprising a memory having a computer program stored therein and a processor arranged to execute the computer program to perform the method of any of claims 1-6.
10. A computer program product for causing a computer device to perform the method of any one of claims 1 to 6 when the computer program product is run on the computer device.
CN202210871371.1A 2022-07-22 2022-07-22 Semantic information query method and device for multi-version unstructured data Pending CN115357682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210871371.1A CN115357682A (en) 2022-07-22 2022-07-22 Semantic information query method and device for multi-version unstructured data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210871371.1A CN115357682A (en) 2022-07-22 2022-07-22 Semantic information query method and device for multi-version unstructured data

Publications (1)

Publication Number Publication Date
CN115357682A true CN115357682A (en) 2022-11-18

Family

ID=84031413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210871371.1A Pending CN115357682A (en) 2022-07-22 2022-07-22 Semantic information query method and device for multi-version unstructured data

Country Status (1)

Country Link
CN (1) CN115357682A (en)

Similar Documents

Publication Publication Date Title
CN109284363B (en) Question answering method and device, electronic equipment and storage medium
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
US9971967B2 (en) Generating a superset of question/answer action paths based on dynamically generated type sets
US9448995B2 (en) Method and device for performing natural language searches
CN115576984A (en) Method for generating SQL (structured query language) statement and cross-database query by Chinese natural language
CN110674229A (en) AST-based relational database SQL table relational analysis and display method
JP4247135B2 (en) Structured document storage method, structured document storage device, structured document search method
CN116257610B (en) Intelligent question-answering method, device, equipment and medium based on industry knowledge graph
CN112507089A (en) Intelligent question-answering engine based on knowledge graph and implementation method thereof
CN113779062A (en) SQL statement generation method and device, storage medium and electronic equipment
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
CN113761162B (en) Code searching method based on context awareness
CN113032371A (en) Database grammar analysis method and device and computer equipment
CN112948419A (en) Query statement processing method and device
CN113032366A (en) SQL syntax tree analysis method based on Flex and Bison
CN114003231B (en) SQL syntax parse tree optimization method and system
CN115357682A (en) Semantic information query method and device for multi-version unstructured data
CN115687399A (en) Syntax parsing method and device for SQL (structured query language) statements
CN115292347A (en) Active SQL algorithm performance checking device and method based on rules
CN114840657A (en) API knowledge graph self-adaptive construction and intelligent question-answering method based on mixed mode
JP4635585B2 (en) Question answering system, question answering method, and question answering program
JP2922701B2 (en) Language conversion method
CN112799638B (en) Non-invasive rapid development method, platform, terminal and storage medium
CN113255374B (en) Question and answer management method and system
CN115827829B (en) Ontology-based search intention optimization method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination