CN113849596A

CN113849596A - Intelligent search method based on natural language processing

Info

Publication number: CN113849596A
Application number: CN202110917325.6A
Authority: CN
Inventors: 鲍捷; 张强; 宋劼; 陆晓晖
Original assignee: Hefei Wenyin Internet Technology Co Ltd
Current assignee: Hefei Wenyin Internet Technology Co Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-12-28

Abstract

The invention discloses an intelligent search method based on natural language processing, which adopts an intelligent search engine, wherein the intelligent search engine comprises a semantic analysis engine and a query engine; the semantic analysis engine: analyzing query entities, known attribute conditions and variable attributes and the relationship among the query entities, the known attribute conditions and the variable attributes, and performing logical reasoning on entity association so as to generate a structured semantic representation capable of accurately describing the query intention of a user; the query engine: and automatically generating and executing a corresponding database query Statement (SQL) according to the result of the semantic analysis engine, and returning a query result. The invention solves the problems of low searching speed, low convenience and low accuracy in the prior art.

Description

Intelligent search method based on natural language processing

Technical Field

The invention relates to the technical field of intelligent search, natural language processing and knowledge maps, in particular to an intelligent search method based on natural language processing.

Background

Conventional search methods include directory search, keyword search, and fuzzy search. The database of the directory search is established manually or semi-automatically, and needs to be manually filled layer by layer according to classification and searched layer by layer, and the searching speed is low. The keyword search has the disadvantages that the returned information is too much after the search is executed, a lot of irrelevant information exists, and a user needs to screen the results one by one. To reduce the information overload, a plurality of keywords are required to be input for auxiliary and progressive query. Fuzzy search, that is, synonym search, cannot help a user find needed accurate information in the shortest time.

The page analysis of the existing traditional search technology is based on the link relation existing between pages, mainly adopts the ways of keyword decomposition, matching and the like to realize information retrieval, cannot represent what information the pages contain, cannot well process the semantics of page information, and lacks knowledge processing capability and understanding capability.

With the rapid development of search engines, the search technology is further improved, and the search engines are further developed towards intellectualization and individuation. In recent years, a new generation of information search support technology such as a semantic network and a knowledge graph is introduced into the industry, research and development of intelligent information search engine products are developed, and user search experience is greatly improved in the aspects of information diversity, search convenience, result accuracy and the like.

Intelligent search is a search method that analyzes information objects and retrieval requests from the perspective of knowledge understanding and logical reasoning. The biggest difference between intelligent search and the traditional search engine is the intellectualization of search process and result, and the semantic relation of information objects can be fully expressed through technologies such as semantic web, knowledge graph and the like, so that the information retrieval requirements of users and the content contained in the information objects can be effectively understood, and the search engine has the capability of understanding semantics and effective reasoning.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide an intelligent searching method based on natural language processing, and solves the problems of low searching speed, low convenience and low accuracy in the prior art.

In order to achieve the purpose, the invention is realized by the following technical scheme: an intelligent search method based on natural language processing comprises the following steps:

1. information is acquired. Data gathering is performed through public channels.

2. And analyzing the stored data. The method comprises the steps that crawled data are analyzed into a proper format through an analyzer and stored in a database, wherein the analyzer comprises a type analyzer and a format analyzer, and the format analyzer is used for analyzing complex data types and formats into a uniform format;

3. and integrating the analysis data. Further integrating and analyzing the data through algorithms, wherein the algorithms comprise a data deduplication algorithm, an information extraction algorithm and an information classification algorithm;

4. and establishing an entity knowledge base. By using the technologies of Chinese word segmentation, part of speech tagging, identification tagging, rule matching and the like, paragraph/sentence level structural analysis is carried out on information, and entities and relations are extracted. Then, establishing an entity knowledge base through a word vector model and through the steps of inverted indexing, keyword optimization, similarity ranking, entity relationship matching and the like;

5. and returning a retrieval result according to a query engine in the intelligent search engine.

The intelligent search method adopts an intelligent search engine, and the intelligent search engine comprises a semantic analysis engine and a query engine; the semantic analysis engine: the query entity, the known attribute conditions and the variable attributes are analyzed, and the relationship among the query entity, the known attribute conditions and the variable attributes is analyzed, and the logical reasoning of entity association can be carried out, so that the structured semantic representation capable of accurately describing the query intention of the user is generated.

The query engine: and automatically generating and executing a corresponding database query Statement (SQL) according to the result of the semantic analysis engine, and returning a query result.

The invention has the following beneficial effects:

1. the technical advancement is as follows: a semantic Parser of M-Parser based on the self-research of the character; the query semantics can be accurately and rapidly understood;

2. high expandability: automatically configuring the system through structured data; "zero training corpus" cold start: data do not need to be marked;

3. high maintainability: problem templates do not need to be manually constructed.

Drawings

The invention is described in detail below with reference to the drawings and the detailed description;

FIG. 1 is a diagram of the system infrastructure of the present invention;

FIG. 2 is a schematic diagram of a semantic analyzer MPar according to the present invention.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

Referring to fig. 1-2, the following technical solutions are adopted in the present embodiment: an intelligent search method based on natural language processing comprises the following steps:

The intelligent search method adopts an intelligent search engine, and the intelligent search engine comprises a semantic analysis engine and a query engine.

The semantic analysis engine disassembles the specific problem by simulating the thinking process of a human to the problem, and finally generates a semantic graph (semantic graph). The semantic analyzer utilizes the named entities and the relationships among the entities identified by the preposed module, constructs and queries the main body in the statement based on a knowledge graph or a database, analyzes the constraint condition of the problem main body through the steps of problem segmentation, constraint condition analysis and the like, finally generates a semantic graph and sends the semantic graph to a query engine. Since it is the query intent understanding that is addressed by this detailed description, its abstract semantics can be represented by entities, attribute elements, and their relationships:

1. the focus entity is as follows: information of what type of entity is to be looked up. Such as querying company basic information or financing coupon information.

2. The attributes to be screened: the screening condition expressed in the question is for which attribute of which entity, and the constraint condition of the attribute value. For example, the attribute "financing bid amount" is satisfied by a value greater than a certain value, such as 1 billion.

3. Target attribute of query: for the focus entity, it needs to query what kind of information it has, such as "total market value" which is an information point.

In addition, there may be inferred components in the complex query semantics due to the associative relationships between entity attributes.

The semantic analysis engine in the present embodiment directly adopts the semantic analyzer MPar of the context interconnection self-research. The method comprises the steps of gradually analyzing attribute values, known attributes, variable attributes and entities through an entity, attribute and attribute value model established by a data pre-modeling module, judging possible entity association and finally generating a semantic graph. The semantic analyzer is driven by data modeling, and different data models can be generated according to the accessed data in different fields, so that different entities are identified and possible relationships are matched. Finally, disambiguation is carried out by a high-abstraction disambiguation module.

The module receives a semantic graph output by a semantic analysis engine, converts semantic analysis results into refined sub-query relations, executes corresponding queries through different types of query components, and finally returns query results after engine checking validity and unified processing. The query combines the result of semantic analysis, the query logic of the service and the data analysis of the knowledge base to complete the reasonable planning of the query. The integration process is divided into two steps, the first step is to generate an abstract query structure, which is independent of the underlying data storage manner. And the second step is to generate an actual query statement which can be operated, and operate to obtain a query result.

One important reason for the two steps is that the specific query statement formats and generation logics are different for different data sources and query modes, such as structured queries executed by SQL statements of relational database queries, for example, text searches for a column in a database (e.g., PostgreSQL) that can store objects, and search queries of Elasticsearch. After the two steps are separated, the actual function of the first step is the common abstraction of different actual query operations in the second step, so that the purposes of high multiplexing and flexible extension when the data source is to be extended to a new data source are achieved. These two steps are respectively realized by two major core sub-modules of the query engine:

1. the Query Builder converts the semantic graph into an abstract Query structure. QueryBuilder is essentially a kind of middleware designed to implement efficient queries.

2. And the Query Resolver generates a Query statement which can be executed finally by the Query structure, executes the statement and returns a result. The system may have multiple Query resolvers, which are introduced in this section to generate a Resolver (also referred to as a table Resolver) of an SQL Query statement on a general relational database. In system implementation, if a common text search is to be supported, a text resolver can be implemented to perform string-based retrieval on text in some columns of data.

The data presented by the knowledge graph part in the system requirements is also completed by the query engine. With the knowledge map as an aid, the question-answering system can insights semantic information behind user query, return more accurate and efficient information, and can meet the query requirements and question-answering effects of users to a greater extent. Since the input data of the project is mostly structured data, the knowledge graph work is embodied in the following two aspects:

1. relation connection: information in the different structured data is linked to the corresponding listed company in a company code and in short.

2. Map inquiring and displaying: and inquiring related basic information, supervision information and transaction information in a database through company codes and short names.

The semantic analyzer MPar of the present embodiment has the following advantages:

1 self-learning from structured data: modeling by utilizing semantic matching constraint contained in the map and the table;

eliminating a large amount of pseudo ambiguity;

massive prediction training is not needed;

2, native support reasoning and cross-table query;

3, key parameters can be configured and optimized;

4, analysis flow: relatively simple classification or matching of semantic parsing decomposition;

mpar analysis procedure:

attribute-attribute value matching: matching (attaching) attribute value nodes such as money amount, name and the like to the optimal attribute node (such as market price total value and name);

attribute classification and reasoning: judging which attributes are known attribute nodes (conditions for screening); which are variable attribute nodes (targets of queries); reasoning and breaking the virtual attribute nodes;

constructing a semantic graph: constructing a semantic graph of a single entity; searching a semantic fragment path; finding out the optimal semantic fragment combination capable of covering the original question sentence; sequencing a plurality of nodes corresponding to the semantic fragments according to the characteristic weight of the nodes;

and (3) entity relationship reasoning: constructing a cross-entity semantic graph according to the attribute association of the cross-entity;

and the 5Query rewriting module supports automatic rewriting and re-analyzing of unidentified information in the question based on the technologies such as BERT and the like, and understands the daily fuzzy expression.

6. Support term configuration background: the background graphical interface is flexibly configured, takes effect immediately and assists in reading the jargon.

The query engine of the specific embodiment has the advantages that:

1. query Resolver supports automatic generation of SQL code.

A. The SQL code is generated by adopting an Object-relational mapping (ORM) mode, the complex bottom operation in the SQL sentence generation process is automatically realized, and only other parts of the system are required to provide database query expressed by using an Object, a function and the like.

B. According to different bottom databases, the SQL dialect conversion is automatically completed.

2. Asynchronous (query) execution flow, asynchronous query avoids blocking the processing flow of the whole system.

A. The front end performs paging display, and the back end rapidly responds to a visible paging result;

B. the back end continuously and asynchronously inquires, and cache the unread paging result;

C. the memory type cache has excellent read-write performance and expandability.

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. An intelligent search method based on natural language processing is characterized by comprising the following steps:

(1) and acquiring information: data collection is carried out through an open channel;

(2) analyzing the stored data: the method comprises the steps that crawled data are analyzed into a proper format through an analyzer and stored in a database, wherein the analyzer comprises a type analyzer and a format analyzer, and the format analyzer is used for analyzing complex data types and formats into a uniform format;

(3) integrating analysis data: further integrating and analyzing the data through algorithms, wherein the algorithms comprise a data deduplication algorithm, an information extraction algorithm and an information classification algorithm;

(4) establishing an entity knowledge base: performing paragraph/sentence level structural analysis on information by using the technologies of Chinese word segmentation, part of speech tagging, identification tagging, rule matching and the like, and extracting entities and relations; then, establishing an entity knowledge base through a word vector model and through the steps of inverted indexing, keyword optimization, similarity ranking, entity relationship matching and the like;

(5) and returning a retrieval result according to a query engine in the intelligent search engine.

2. The intelligent search method based on natural language processing according to claim 1, wherein the intelligent search method employs an intelligent search engine, the intelligent search engine comprising a semantic analysis engine and a query engine; the semantic analysis engine: analyzing query entities, known attribute conditions and variable attributes and the relationship among the query entities, the known attribute conditions and the variable attributes, and performing logical reasoning on entity association so as to generate a structured semantic representation capable of accurately describing the query intention of a user;