CN111552788B - Database retrieval method, system and equipment based on entity attribute relationship - Google Patents

Database retrieval method, system and equipment based on entity attribute relationship Download PDF

Info

Publication number
CN111552788B
CN111552788B CN202010334936.3A CN202010334936A CN111552788B CN 111552788 B CN111552788 B CN 111552788B CN 202010334936 A CN202010334936 A CN 202010334936A CN 111552788 B CN111552788 B CN 111552788B
Authority
CN
China
Prior art keywords
data
query
module
database
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010334936.3A
Other languages
Chinese (zh)
Other versions
CN111552788A (en
Inventor
叶杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhuochen Info Tech Co ltd
Original Assignee
Shanghai Zhuochen Info Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhuochen Info Tech Co ltd filed Critical Shanghai Zhuochen Info Tech Co ltd
Priority to CN202010334936.3A priority Critical patent/CN111552788B/en
Publication of CN111552788A publication Critical patent/CN111552788A/en
Application granted granted Critical
Publication of CN111552788B publication Critical patent/CN111552788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a database retrieval method, a system and equipment based on entity attribute relationship, wherein the method comprises the following steps: s1, receiving query contents input by a user from an application interface; s2, judging a query mode according to the query content, wherein the query mode comprises semantic query, relational query and direct query; s3, if the query is a semantic query, providing a unified structured semantic query interface, converting the query content into a standard data format, analyzing and processing the entity, the attribute and the relationship of the query content through a preset algorithm, if the query is a relationship query, executing the step S4, and if the query is a direct query, executing the step S4 or S5; s4, comparing the query content with triple data in a preset database; and S5, acquiring a retrieval result of the preset database, and returning the retrieval result to the application interface. The invention is based on the data organization mode of the entity, the attribute and the relationship, is convenient for data association analysis and provides a realization basis for data mining.

Description

Database retrieval method, system and equipment based on entity attribute relationship
Technical Field
The invention relates to the field of database retrieval, in particular to a database retrieval method, a system and equipment based on entity attribute relationship.
Background
The semantic query is a process of retrieving knowledge from a knowledge base on the basis of knowledge organization, and is an intelligent retrieval mode which is based on a knowledge organization system and can realize knowledge association and concept semantic retrieval. And earlier in image retrieval, features of images, such as authors, years, genres, sizes, etc. of pictorial representations, were described in a textual description. Image retrieval techniques that analyze and retrieve the content semantics of an image, such as color, texture, layout, etc., only then appear. Text-based image retrieval typically takes the form of a keyword to query an image, e.g., a picture that is intended to look up a landscape scene, and "landscape painting" may be entered; if it is desired to look up a picture about a cat, the "cat" can be entered directly.
Although image retrieval has appeared a plurality of methods for representing image features such as histograms, color moments, color sets and the like, the method breaks through analysis of low-level features, realizes higher semantic retrieval, has high difficulty and slow progress, has higher cost for accessing the functions such as semantic retrieval, image retrieval and the like, cannot be quickly and conveniently embedded into applications, but increases system complexity and reduces performance if micro-services are built between a traditional database and upper-layer applications.
Based on the method, the system and the equipment, the invention provides a database retrieval method, system and equipment based on entity attribute relation on the basis of the traditional database.
Disclosure of Invention
Based on this, it is necessary to provide a database retrieval method, system and device based on entity attribute relationship for solving the problems existing in the conventional database semantic query.
The invention discloses a database retrieval method based on entity attribute relationship, which comprises the following steps:
s1, receiving query contents input by a user from an application interface;
s2, judging a query mode according to the query content, wherein the query mode comprises semantic query, relational query and direct query;
s3, if the query is semantic query, providing a unified structured semantic query interface, converting the query content into a standard data format, analyzing and processing the entity, the attribute and the relationship of the query content through a preset algorithm, if the query is the relationship query, executing the step S4, and if the query is direct query, judging and executing the step S4 or the step S5 according to the query content;
s4, comparing the query content with triple data in a preset database;
and S5, acquiring a retrieval result of the preset database, and returning the retrieval result to the application interface.
In one embodiment, the step S1 is preceded by:
s01, reading external source data and a basic information base, converting the external source data into a standard data format, and storing the standard data format and the basic information base into the preset database;
s02, after data query and verification are carried out on the external source data, a preset algorithm is called to extract ternary group data;
and S03, establishing data index mapping for the triple-unit data and storing the triple-unit data in the preset database.
In one embodiment, the step of extracting triple data in step S02 includes:
identifying the external source data as an entity field and an entity attribute field;
calling a preset algorithm to convert part of nonstandard data in the external source data into an entity and an attribute structure;
the relationships and associated attributes between entities are extracted.
In one embodiment, the parsing process in step S3 includes chinese word segmentation, part-of-speech tagging, named entity recognition, keyword and phrase extraction, pinyin conversion, text recommendation, dependency parsing, and text classification.
The invention also provides a database retrieval system based on the entity attribute relationship, which comprises an application layer, a service layer and a data layer, wherein the service layer comprises a natural language processing module, an intelligent retrieval module and a data interface module, and the data layer comprises a basic function module, a data execution module and a preset database; wherein the content of the first and second substances,
the application layer is used for receiving query contents input by a user from the application interface;
the data interface module is used for connecting an application layer and a service layer, forwarding the query content to the intelligent retrieval module and returning a retrieval result to the application layer;
the intelligent retrieval module is used for judging a query mode according to the query content, wherein the query mode comprises semantic query, relational query and direct query, providing a unified structured semantic query interface, and calling the natural language processing module to analyze and process the query content;
the natural language processing module is used for converting the query content into a standard data format and calling a preset algorithm to analyze and process entities, attributes and relationships of the query content, wherein the analysis and process comprises Chinese word segmentation, part of speech tagging, named entity identification, keyword and phrase extraction, pinyin conversion, text recommendation, dependency syntactic analysis and text classification;
the data execution module is used for invoking a preset algorithm to extract ternary group data after data query and verification are carried out on the external source data;
the basic function module is used for establishing data index mapping on the triple data and then storing the triple data to the preset database;
and the preset database is used for storing data, and the stored data comprises the ternary group data, external source data, a basic information base and system data.
In one embodiment, the method further comprises:
and the map module is used for converting the external source data into the graphic data and returning the graphic data to the application layer through the data interface module.
In one embodiment, the data execution module comprises a query checking unit, a triple extraction unit and a read-write control unit; wherein the content of the first and second substances,
the query and verification unit is used for performing query and verification on the external source data;
the triple extraction unit is used for identifying the external source data into an entity field and an entity attribute field, calling a preset algorithm to convert part of nonstandard data in the external source data into an entity and attribute structure, and extracting the relationship and the associated attributes between the entities;
and the read-write control unit is used for storing the triple data to the preset database and reading the data stored in the preset database.
In one embodiment, the application layer includes a system management module, a log management module, a data display module, and an external calling module, the system management module is configured to configure the system according to user needs, the log management module is configured to view a log of a current system and a system running state, the data display module is configured to view data in a current database so as to debug statistical data, and the external calling module is configured to call external source data.
The invention provides a database retrieval device based on entity attribute relationship, which comprises a memory and a processor, wherein the memory is stored with computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the database retrieval method based on entity attribute relationship, and the database retrieval device can adopt a distributed computing framework.
The invention provides a database retrieval device based on entity attribute relationship, which comprises a display terminal, wherein the display terminal is provided with a database retrieval system based on the entity attribute relationship, and the database retrieval device can adopt a distributed computing framework.
The database retrieval method, the database retrieval system and the database retrieval equipment based on the entity attribute relationship are based on the data organization mode of the entity, the attribute and the relationship, are convenient for data association analysis and provide a realization basis for data mining.
Drawings
FIG. 1 is an architecture diagram of a database retrieval system based on entity attribute relationships, in one embodiment;
FIG. 2 is a flow diagram of a method for data retrieval based on entity attribute relationships in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
FIG. 1 is an architecture diagram of a database retrieval system based on entity attribute relationships, as shown in one embodiment. A database retrieval system based on entity attribute relationship comprises an application layer 100, a business layer 200 and a data layer 300.
The application layer 100 is located at the uppermost layer, and is used for displaying data and receiving data input by a user, so as to provide an interactive application interface for the user. The application layer 100 includes a system management module 101, a log management module 102, a data presentation module 103, and an external call module 104. The system management module 101 is configured to configure a system according to a user requirement, the log management module 102 is configured to view a log of a current system and a system running state, the data display module 103 is configured to view data in a current database so as to debug statistical data, and the external call module 104 is configured to call external source data. The log management module 102 may screen various types of logs through a semantic retrieval manner. The invention adds a configuration management interface in the application layer for checking the running state of the database and setting or selecting the function of the system, so the database retrieval system based on the entity attribute relationship has the function of uniform system management.
The invention provides a plurality of modes such as graphic screening, semantic query, relation query, direct query, user-defined query language and the like on the user interface of the application layer to simplify the user operation, and can visually display data.
The service layer 200 processes data service logic, and is a bridge between the application layer 100 and the data layer 300. The business layer 200 includes a retrieval engine 201 and a data interface module 202, which are composed of a natural language processing module 2011, a graph module 2012 and an intelligent retrieval module 2013. Because the business layer 200 is provided with the retrieval engine 201, the development cost can be greatly reduced, and the retrieval efficiency is improved.
The data interface module 202 provides an access interface 203 for connecting the application layer 100 and the service layer 200, and is responsible for forwarding the query content input by the user in the application layer 100 to the intelligent retrieval module 2013, and returning the retrieval result to the external calling module 104 of the application layer 100 in a preset format. The preset format can be set by the user, and for example, the preset format can be text data or graphic data.
The intelligent retrieval module 2013 obtains data of query contents from the data interface module 202, then determines a query mode according to the query contents, where the query mode includes semantic query, relational query and direct query, provides a unified structured semantic query interface, converts the query contents into a standard data format, and calls the natural language processing module 2011 to analyze and process the query contents. It should be noted that the query content input by the user needs to be converted into standard data before being identified by the preset database 303.
Wherein, the direct query is to support common query modes such as database, fuzzy, full text and the like. The query mode of the user inputting the query content is illustrated as follows: if the query content input by the user is 'people from Zhejiang Shanghai No. 4.10', the query belongs to semantic query; if the query content input by the user is 'person who is in the same line with Zhang III', the relationship query is carried out; if the query content input by the user is ' Zhang three ' mobile phone number ', the query belongs to direct query. The direct query can be directly queried without analyzing and processing the query content input by the user, and can be data of a basic information base or system data. The direct query supports the traditional query modes such as SQL, LINQ, SPARQL, PHQ and the like.
Further, the intelligent retrieval module 2013 is compatible with different types and languages of query modes, and can modify the data acquired from the data interface module 202, and call the natural language processing 2011 module to analyze and process the query content, and transmit the query content to the data layer 300. The intelligent retrieval module 2013 is convenient for the atlas module 2012 to organize into an image-like data structure by using a triple data storage mode of entities, attributes and relations of the preset database 303, so as to provide data support for the construction of the atlas.
Therefore, the data association analysis is convenient to perform and a realization basis is provided for data mining based on the triple data storage mode of the entity, the attribute and the relationship.
It should be noted that the standard data refers to a database language that can be recognized by the preset database 303, and the standard data has a preset format specification, and the present invention is not limited to this, and can be understood as a structure that converts any form of data formats, such as xml (an extensible markup language), a database table, a structure, a byte stream, and a protobuf (the protobuf is generally called Google Protocol Buffer, which is an efficient and light structured data storage manner, and can be used for communication protocols, data storage, and the like), binary streams, and the like, into a standardized structure, such as json (the json is generally called JavaScript Object Notation, which is a lightweight data exchange format). Generally, query contents input by a user and external source data need to be converted into a standard data format.
The graph module 2012 is configured to convert the external source data into graph data and return the graph data to the application layer 100 through the data interface module 202.
The natural language processing module 2011 is configured to convert the query content into a standard data format, and invoke a preset algorithm to perform analysis processing on the query content, where the analysis processing includes chinese word segmentation, part-of-speech tagging, named entity identification, keyword and phrase extraction, pinyin conversion, text recommendation, dependency parsing, and text classification. The following describes the models or algorithms used for the analysis process:
further, the models or algorithms used for Chinese word segmentation include, but are not limited to, Forward Maximum Matching (FMM), reverse maximum matching (BMM), and N-shortest path. The Model or algorithm used for part-of-speech tagging includes, but is not limited to, Hidden Markov models (HMM for short, Hidden Markov models are statistical models, which are used to describe a Markov process with Hidden unknown parameters), Conditional Random Fields (CRFs), and Recurrent Neural Networks (RNNs). The model or algorithm used for named entity recognition includes, but is not limited to, Maximum Entropy model (called Maximum Engine for short), Support Vector Machine (SVM).
The models or algorithms used for the keywords and phrases include, but are not limited to, TextRank algorithm (TextRank is an algorithm used for keyword extraction, and can also be used for phrase extraction and automatic summarization), TF-IDF algorithm (term frequency-inverse document frequency, TF-IDF for short). TF-IDF is a commonly used weighting technique for information retrieval (information retrieval) and text mining (text mining). TF-IDF is a statistical method to evaluate the importance of a word to one of a set of documents or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. The main idea of TF-IDF is: if a word appears in an article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification.
The models or algorithms used for pinyin conversion include, but are not limited to, hidden markov models, viterbi algorithms, which is a dynamic programming algorithm used to find sequences of-viterbi paths-hidden states that are most likely to produce observation event sequences, particularly in the context of markov information sources and hidden markov models.
The model or algorithm used for text recommendation includes but is not limited to collaborative filtering algorithm, which is a well-known and commonly used recommendation algorithm, and it finds preference bias of users based on mining of historical behavior data of users, and predicts products that users may prefer to recommend. Commonly used collaboratIve filtering algorithms are classified into two types, user-based collaboratIve filtering (user-based collaboratIve filtering), and item-based collaboratIve filtering (item-based collaboratIve filtering).
The model or algorithm used for the dependency parsing includes, but is not limited to, a Dynamic programming algorithm (DP), which is a branch of operations research and is a mathematical method for solving a decision process (decision process) optimization, and is generally used for solving a problem with some optimal property.
The models or algorithms used for text classification include, but are not limited to, decision trees, neural networks, naive bayes, neighborhood algorithms (KNN for short), Support Vector Machines (SVM), rochio algorithm (rochio algorithm is an efficient classification algorithm and widely applied to the fields of text classification, query expansion, etc.), adaboost algorithm (adaboost combines the opinions of many classifiers together effectively to achieve better classification), word2vec model (word2vec is a group of related models used to generate word vectors, these models are shallow and double-layered neural networks used to train to reconstruct word text in linguistics, the networks are represented by words, and input words in adjacent positions need to be guessed, the order of words is unimportant under the assumption of the bag-of-words model in word2vec, after training, the word2vec model can be used to map each word to a vector, and can be used to represent the relationship between words, the vector is a hidden layer of the neural network. ) And the N-gram language model corpus is based on the neural network.
Specifically, the natural language processing module 2011 can perform high-speed real-time analysis on the input text or even the chat records, and perform chinese word segmentation and part-of-speech tagging on sentences or paragraphs in the text by the tagging model. And then, extracting the morphology, syntax and dependency relationship (such as the relationships of a predicate relationship, a move-guest relationship, a simple-guest relationship, a preposed object, a bilingual, a fixed relationship, a shape-middle structure, a move-complement structure, a parallel relationship, a mediate relationship, a left additional relationship, a right additional relationship, an independent structure, punctuation, a core relationship and the like) of the labeling result by using a semantic analysis model.
The dependency syntax analysis analyzes sentences into a dependency syntax tree, describes the dependency relationship among all the words, and establishes a dependency syntax dictionary for each word, namely indicates the syntactic collocation relationship among the words. Specifically, when a user inputs query content, the database engine analyzes the query content to obtain a main body in the query content and all attributes of the main body, for example, if the query content is input as "person from Jiangsu", the annotation model will split the sentence into the forms of "from", "Jiangsu" and "person", and then determine that the main body is "person" through the dependency relationship analysis of the semantic analysis model, and further notify the data engine that the entity retrieved by the data engine is "person", and the relationship is "from Jiangsu", and the attributes may be all personnel records of an identity card, a train number, a mobile phone number, and the like.
The following illustrates an analysis process of the natural language processing module in an embodiment of the present invention:
(1) the query content input by the user is assumed as a semantic query instance, for example: "from Jiangsu people".
Syntactic dependency: from | Jiangsu |, the part of speech tagging, the text input into the recurrent neural network model, the output: p | person n from c | Jiangsu n |;
entity identification: inputting text into a support vector machine model, and outputting: "human";
and (3) attribute identification: according to the entity distinguishing attribute: "Jiangsu";
and (3) relationship identification: and (3) syntactic analysis results: "from";
triple (entity, attribute, relationship) is formed: "human", "from", "Jiangsu";
and querying ternary group data: { Zhang three, Jiangsu, from }.
(2) Suppose the query content input by the user is a relational query instance, such as: "people who are in the same row as Zhang III".
Subject recognition (without limitation natural language processing or keyword methodology): a human;
key predicates: "in-line";
and (3) query conditions: query: { E: "human", S: Zhang III ", R: the same line' };
querying relational data: and analyzing the persons in the same row in real time.
As a result: li Si and Wang Wu.
The data layer 300 includes a basic function module 301, a data execution module 302, and a preset database 303.
The data execution module 302 is configured to invoke a preset algorithm to extract triple data after performing data query and verification on the external source data, and the data execution module 302 includes a query and verification unit 3021, a triple extraction unit 3022, and a read-write control unit 3023.
The query checking unit 3021 is configured to perform query checking on the external source data.
The triple extracting unit 3022 is configured to identify the external source data as an entity field and an entity attribute field, call a preset algorithm to convert part of non-standard data of the external source data into an entity and attribute structure, and extract a relationship and an associated attribute between entities.
Specifically, the triple extraction unit 3022 configures a docking manner with the preset database 303, and actively extracts data from the preset database 303 according to a preset rule. For example: the preset database 303 stores records, each of which is a field, and one record in the preset database 303 is "three pieces" | "male" | "zhejiang" | "50". The Recognition model of the triple extracting unit 3022 calls named Entity Recognition (NER for short) in the natural language module 2011 to recognize the Entity, that is, zhang san, from the data record, and then extracts attributes by using a conversion algorithm inside the triple extracting unit 3022 and combining an existing basic information base and using programming technologies such as regular matching, where the attributes of zhang san are gender, native place, age, and the like. Finally, the relationship model of the triple extraction unit 3022 calls the dependency parsing analysis relationship in the natural language module 2011 to "from".
The read-write control unit 3023 is configured to store the triple data in the preset database 303, and read data stored in the preset database 303. The read-write control unit 3023 can read and write data quickly with high concurrency.
The basic function module 301 is configured to establish data index mapping for the triple data and store the triple data in the preset database 303. The basic function module 301 is responsible for module function encapsulation and inter-module communication, index mapping is established for data conforming to an access standard format, and partial non-standard data can be stored in the preset database 303 after being converted into the standard format, so that a user does not need to perform structured processing on the data. Further, the basic function module 301 may perform data processing and normalization on the acquired external source data.
Further, the basic function module 301 adopts distributed computing, and is difficult to deal with by only depending on a single device when data analysis and storage of mass data are performed. A certain computing mechanism is necessary to equally share the computing task to multiple machines. The full scheduling and the utilization of cluster computing resources can reduce the performance requirement on the single computer and effectively complete a huge amount of computing tasks.
But are very complex and varied for situations that need to be considered in the development and maintenance of distributed computing. In the distributed computing process, the communication of control information in the computing process, the data acquisition of each task, the combination of computing results and the rollback of error computation are required to ensure normal operation. Therefore, in one embodiment of the present invention, distributed computing frameworks such as Hadoop, Spark, Storm, etc. are adopted, which has the following advantages: the scarce resources can be shared, the computational load can be balanced across multiple computers through distributed computing, and programs can be placed on the computer best suited to run it. The sharing of scarce resources and the balancing of loads are one of the core ideas of computer distributed computing.
The preset database 303 is used for storing data, where the stored data includes the triple data, external source data, a basic information base, system data, and the like.
The following illustrates an exemplary three-tuple data format and data index mapping stored in the predetermined database according to an embodiment of the present invention:
the format of the triples stored in the preset database 303 is: [ entity, attribute, relationship ]; the following description is also made by taking an example of the analysis processing by the natural language processing module:
triple data stored in the preset database 303: { 'Zhang III', 'Hangzhou', 'from' },
{ "Zhang San", "cell phone", "131 xxxxxxxx" }, { "Litetra", "Hangzhou people", "is" }.
Comparing the query content analyzed and processed by the natural language processing module 2011 with the triple data stored in the preset database 303, it can be inferred that:
{ "Li four", "Zhang three", "same city" }
The format of the established data index mapping is as follows:
{ tag { "Zhang III", "Hangzhou", "from" }, pos:10 };
{ tag { "Zhang III", "Mobile phone", "131 xxxxxxxx" }, pos:11 };
{ tag { "Li IV", "Hangzhou man", "is" }, pos:15 };
the query entity "Zhang three" can match pos10, pos11 on the data index mapping table. The method avoids the whole-table query of the original table, thereby realizing the semantic retrieval function of the database, having flexible construction and convenient integration and improving the system performance.
Further, the preset database 303 adopts a high-performance processing scheme for mass data, and the storage mode includes hybrid storage of media such as a disk, a memory, and a file, and supports distributed deployment, load balancing, and disaster recovery backup processing, and the optimal strategy can be comprehensively evaluated according to factors such as server environment and hardware performance. In addition, a data index mapping is established for the ternary-array data stored in the preset database 303, and a guarantee is provided for high-speed data retrieval, so that the preset database 303 is beneficial to application development with high concurrency and high throughput requirements.
Specifically, the way for the preset database 303 to establish the data index mapping includes, but is not limited to, establishing data entries (each piece of metadata is a data entry composed of entities, attributes, and relationships), where an entry has uniqueness as a subject and is specific and meaningful data.
Further, the present invention may also read in external source data, a basic information base and a block chain, where the external source data includes a database, a file, Kafka (Kafka is a high throughput distributed publish-subscribe message system that can process all action flow data of a consumer in a website), a data lake (the data lake is a centralized storage mass, multiple sources, and multiple types of data), Hadoop (the Hadoop is a software framework capable of performing distributed processing on a large amount of data), and the basic information base includes an IP address, a region code, a telephone number attribution, an address longitude and latitude, and the like, and then stores the data into the preset database 303 after establishing a data index map. The blockchain is a distributed shared database.
In one example, a database retrieval device is provided, which includes a display terminal, where the display terminal is provided with a database retrieval system based on entity attribute relationship, the system includes an application layer, a service layer and a data layer, the service layer includes a natural language processing module, an intelligent retrieval module and a data interface module, and the data layer includes a basic function module, a data execution module and a preset database.
The application layer is used for receiving query contents input by a user from the application interface; the data interface module is used for connecting an application layer and a service layer, forwarding the query content to the intelligent retrieval module and returning a retrieval result to the application layer; the intelligent retrieval module is used for judging a query mode according to the query content, wherein the query mode comprises semantic query, relational query and direct query, providing a unified structured semantic query interface, and calling the natural language processing module to analyze and process the query content; the natural language processing module is used for converting the query content into a standard data format and calling a preset algorithm to analyze and process entities, attributes and relationships of the query content, wherein the analysis and process comprises Chinese word segmentation, part of speech tagging, named entity identification, keyword and phrase extraction, pinyin conversion, text recommendation, dependency syntactic analysis and text classification; the basic function module is used for establishing data index mapping on the triple data and then storing the triple data to the preset database; the data execution module is used for invoking a preset algorithm to extract ternary group data after data query and verification are carried out on the external source data; and the preset database is used for storing data, and the stored data comprises the ternary group data, external source data, a basic information base and system data.
Fig. 2 is a flowchart of a data retrieval method based on entity attribute relationship in an embodiment, and as shown in the figure, a database retrieval method based on entity attribute relationship includes the following steps:
s1, receiving query contents input by a user from an application interface, for example, the user can input the query contents on the application interface of an application layer in a self-defined mode.
And S2, judging a query mode according to the query content, wherein the query mode comprises semantic query, relational query and direct query.
And S3, if the query is semantic query, providing a unified structured semantic query interface, converting the query content into a standard data format, and calling a preset algorithm to analyze and process the entity, attribute and relationship of the query content, wherein the analysis and process comprise Chinese word segmentation, part of speech tagging, named entity identification, keyword and phrase extraction, pinyin conversion, text recommendation, dependency syntax analysis, text classification and the like.
If the query is a direct query, the steps S4 and S5 are executed or the step S5 is directly executed according to the queried content.
And S4, comparing the query content with triple data in a preset database. It should be noted that, if the query is a semantic query, the query content after the analysis processing is compared with the triple data in the preset database; if the query is a relational query, since the query content does not need to be subjected to the analysis processing of the step S3, the query content is directly the query content input by the user.
The direct query can not only query the triple data stored in the preset database 303, but also query the data of the basic information base or the system data or the data of the block chain. If the direct query requires query of triple data, performing step S4 and step S5; if the direct query does not need to query the triple data, but query the data or system data of the basic information base, the step S4 does not need to be executed, i.e. the comparison with the triple data stored in the preset database 303 is not needed, but the step S5 is directly executed to obtain the data or system data of the basic information base.
In addition, since the step S4 needs to be compared with the three sets of data pre-stored in the preset database, the following steps S01-S03 are performed before the step S1:
and S01, reading in external source data and a basic information base, converting the external metadata into a standard data format, and storing the standard data format and the basic information base together into the preset database.
S02, after the external source data is queried and checked, calling a preset algorithm to extract ternary group data, wherein the step of extracting the ternary group data specifically comprises the following steps: and identifying the external source data into an entity field and an entity attribute field, calling a preset algorithm to convert part of nonstandard data in the external source data into an entity and attribute structure, and extracting the relationship and the associated attributes between the entities.
And S03, establishing data index mapping for the triple-unit data and storing the triple-unit data in the preset database.
And S5, acquiring a retrieval result of the preset database, and returning the retrieval result to the application interface.
In one example, a database retrieval device is provided, which includes a memory and a processor, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes the steps performed by the database retrieval method based on entity attribute relationship, including: receiving query contents input by a user from an application interface, and judging a query mode according to the query contents, wherein the query mode comprises semantic query, relational query and direct query:
if the query is semantic query, providing a unified structured semantic query interface, converting the query content into a standard data format, calling a preset algorithm to analyze the entity, the attribute and the relationship of the query content, comparing the analyzed query content with triple data in a preset database, acquiring a retrieval result of the preset database, and returning the retrieval result to the application interface. The analysis processing comprises Chinese word segmentation, part of speech tagging, named entity identification, keyword and phrase extraction, pinyin conversion, text recommendation, dependency syntax analysis and text classification.
If the relation query is carried out, a preset algorithm is not required to be called to analyze and process the query content, but the preset algorithm is directly called to compare the query content with the ternary data prestored in the preset database, so that a retrieval result of the preset database is obtained, and the retrieval result is returned to the application interface.
If the query is a direct query, a preset algorithm is not required to be called to analyze and process the query content, but the data of the preset database is directly queried, wherein the data can be triple data, data of a basic information base or system data, a retrieval result of the preset database is obtained, and the retrieval result is returned to the application interface.
Therefore, the data organization mode based on the entities, the attributes and the relations is convenient for data association analysis and provides a basis for data mining.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (2)

1. A database retrieval system based on entity attribute relationship comprises an application layer, a service layer and a data layer, and is characterized in that:
the business layer comprises a natural language processing module, an intelligent retrieval module and a data interface module;
the data layer comprises a basic function module, a data execution module and a preset database;
the application layer is used for receiving query contents input by a user from the application interface; the data interface module is used for connecting an application layer and a service layer, forwarding the query content to the intelligent retrieval module and returning a retrieval result to the application layer; the intelligent retrieval module is used for judging a query mode according to the query content, wherein the query mode comprises semantic query, relational query and direct query, providing a unified structured semantic query interface, and calling the natural language processing module to analyze and process the query content;
the natural language processing module is used for converting the query content into a standard data format and calling a preset algorithm to analyze and process the entity, the attribute and the relationship of the query content;
the analysis processing comprises Chinese word segmentation, part of speech tagging, named entity identification, keyword and phrase extraction, pinyin conversion, text recommendation, dependency syntax analysis and text classification;
the data execution module is used for invoking a preset algorithm to extract the ternary group data after data query and verification are carried out on external source data;
the basic function module is used for establishing data index mapping on the triple data and then storing the triple data to the preset database; the basic function module adopts distributed computation;
the preset database is used for storing data, and the stored data comprises the ternary group data, external source data, a basic information base and system data;
the retrieval system also comprises a map module, wherein the map module is used for converting the external source data into graphic data and returning the graphic data to the application layer through the data interface module;
the data execution module comprises a query and check unit, a triple extraction unit and a read-write control unit;
the query and verification unit is used for performing query and verification on the external source data;
the triple extraction unit is used for identifying the external source data into an entity field and an entity attribute field, calling a preset algorithm to convert part of nonstandard data in the external source data into an entity and attribute structure, and extracting the relationship and the associated attributes between the entities;
the read-write control unit is used for storing the triple data to the preset database and reading the data stored in the preset database;
the application layer comprises a system management module, a log management module, a data display module and an external calling module;
the system comprises a system management module, a log display module, a data display module and an external calling module, wherein the system management module is used for configuring a system according to user needs, the log management module is used for checking logs and system running conditions of the current system, the data display module is used for checking data in a current database so as to debug statistical data, and the external calling module is used for calling external source data.
2. An entity attribute relationship-based database retrieval device, comprising a display terminal, wherein the display terminal is provided with the entity attribute relationship-based database retrieval system according to claim 1, and the database retrieval device adopts a distributed computing framework.
CN202010334936.3A 2020-04-24 2020-04-24 Database retrieval method, system and equipment based on entity attribute relationship Active CN111552788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010334936.3A CN111552788B (en) 2020-04-24 2020-04-24 Database retrieval method, system and equipment based on entity attribute relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010334936.3A CN111552788B (en) 2020-04-24 2020-04-24 Database retrieval method, system and equipment based on entity attribute relationship

Publications (2)

Publication Number Publication Date
CN111552788A CN111552788A (en) 2020-08-18
CN111552788B true CN111552788B (en) 2021-08-20

Family

ID=72005878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010334936.3A Active CN111552788B (en) 2020-04-24 2020-04-24 Database retrieval method, system and equipment based on entity attribute relationship

Country Status (1)

Country Link
CN (1) CN111552788B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022150357A1 (en) * 2021-01-07 2022-07-14 Telepathy Labs, Inc. System and method for analysis of data from multiple data sources
CN115145953A (en) * 2021-10-22 2022-10-04 上海卓辰信息科技有限公司 Data query method
CN113987145B (en) * 2021-10-22 2024-02-02 智联网聘信息技术有限公司 Method, system, equipment and storage medium for accurately reasoning user attribute entity
CN116028597B (en) * 2023-03-27 2023-07-21 南京燧坤智能科技有限公司 Object retrieval method, device, nonvolatile storage medium and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140588A (en) * 2007-10-10 2008-03-12 华为技术有限公司 Method and apparatus for ordering incidence relation search result
CN101710318A (en) * 2009-09-08 2010-05-19 中国农业大学 Knowledge intelligent acquiring system of vegetable supply chains
US10489419B1 (en) * 2016-03-28 2019-11-26 Wells Fargo Bank, N.A. Data modeling translation system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140588A (en) * 2007-10-10 2008-03-12 华为技术有限公司 Method and apparatus for ordering incidence relation search result
CN101710318A (en) * 2009-09-08 2010-05-19 中国农业大学 Knowledge intelligent acquiring system of vegetable supply chains
US10489419B1 (en) * 2016-03-28 2019-11-26 Wells Fargo Bank, N.A. Data modeling translation system

Also Published As

Publication number Publication date
CN111552788A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111552788B (en) Database retrieval method, system and equipment based on entity attribute relationship
US10915577B2 (en) Constructing enterprise-specific knowledge graphs
CN109670163B (en) Information identification method, information recommendation method, template construction method and computing device
US9323767B2 (en) Performance and scalability in an intelligent data operating layer system
US11194797B2 (en) Automatic transformation of complex tables in documents into computer understandable structured format and providing schema-less query support data extraction
JPWO2014033799A1 (en) Word semantic relation extraction device
Mottaghinia et al. A review of approaches for topic detection in Twitter
US11194798B2 (en) Automatic transformation of complex tables in documents into computer understandable structured format with mapped dependencies and providing schema-less query support for searching table data
US11308083B2 (en) Automatic transformation of complex tables in documents into computer understandable structured format and managing dependencies
Rahnama Distributed real-time sentiment analysis for big data social streams
WO2021047373A1 (en) Big data-based column data processing method, apparatus, and medium
CN102567509A (en) Method and system for instant messaging with visual messaging assistance
CA3138556A1 (en) Apparatuses, storage medium and method of querying data based on vertical search
CN113157930A (en) Knowledge graph construction method, system and terminal based on multi-source heterogeneous data
US20220358379A1 (en) System, apparatus and method of managing knowledge generated from technical data
WO2020149959A1 (en) Conversion of natural language query
Krzywicki et al. Data mining for building knowledge bases: techniques, architectures and applications
CN114186567A (en) Sensitive word detection method and device, equipment, medium and product thereof
CN112507089A (en) Intelligent question-answering engine based on knowledge graph and implementation method thereof
KR20120047622A (en) System and method for managing digital contents
Kalo et al. Knowlybert-hybrid query answering over language models and knowledge graphs
CN114239828A (en) Supply chain affair map construction method based on causal relationship
Jiang et al. A semantic-based approach to service clustering from service documents
De Maio et al. Online query-focused twitter summarizer through fuzzy lattice
Nagrale et al. Document theme extraction using named-entity recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant