CN113849596A - Intelligent search method based on natural language processing - Google Patents

Intelligent search method based on natural language processing Download PDF

Info

Publication number
CN113849596A
CN113849596A CN202110917325.6A CN202110917325A CN113849596A CN 113849596 A CN113849596 A CN 113849596A CN 202110917325 A CN202110917325 A CN 202110917325A CN 113849596 A CN113849596 A CN 113849596A
Authority
CN
China
Prior art keywords
query
engine
intelligent search
data
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110917325.6A
Other languages
Chinese (zh)
Inventor
鲍捷
张强
宋劼
陆晓晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Wenyin Internet Technology Co Ltd
Original Assignee
Hefei Wenyin Internet Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Wenyin Internet Technology Co Ltd filed Critical Hefei Wenyin Internet Technology Co Ltd
Priority to CN202110917325.6A priority Critical patent/CN113849596A/en
Publication of CN113849596A publication Critical patent/CN113849596A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an intelligent search method based on natural language processing, which adopts an intelligent search engine, wherein the intelligent search engine comprises a semantic analysis engine and a query engine; the semantic analysis engine: analyzing query entities, known attribute conditions and variable attributes and the relationship among the query entities, the known attribute conditions and the variable attributes, and performing logical reasoning on entity association so as to generate a structured semantic representation capable of accurately describing the query intention of a user; the query engine: and automatically generating and executing a corresponding database query Statement (SQL) according to the result of the semantic analysis engine, and returning a query result. The invention solves the problems of low searching speed, low convenience and low accuracy in the prior art.

Description

Intelligent search method based on natural language processing
Technical Field
The invention relates to the technical field of intelligent search, natural language processing and knowledge maps, in particular to an intelligent search method based on natural language processing.
Background
Conventional search methods include directory search, keyword search, and fuzzy search. The database of the directory search is established manually or semi-automatically, and needs to be manually filled layer by layer according to classification and searched layer by layer, and the searching speed is low. The keyword search has the disadvantages that the returned information is too much after the search is executed, a lot of irrelevant information exists, and a user needs to screen the results one by one. To reduce the information overload, a plurality of keywords are required to be input for auxiliary and progressive query. Fuzzy search, that is, synonym search, cannot help a user find needed accurate information in the shortest time.
The page analysis of the existing traditional search technology is based on the link relation existing between pages, mainly adopts the ways of keyword decomposition, matching and the like to realize information retrieval, cannot represent what information the pages contain, cannot well process the semantics of page information, and lacks knowledge processing capability and understanding capability.
With the rapid development of search engines, the search technology is further improved, and the search engines are further developed towards intellectualization and individuation. In recent years, a new generation of information search support technology such as a semantic network and a knowledge graph is introduced into the industry, research and development of intelligent information search engine products are developed, and user search experience is greatly improved in the aspects of information diversity, search convenience, result accuracy and the like.
Intelligent search is a search method that analyzes information objects and retrieval requests from the perspective of knowledge understanding and logical reasoning. The biggest difference between intelligent search and the traditional search engine is the intellectualization of search process and result, and the semantic relation of information objects can be fully expressed through technologies such as semantic web, knowledge graph and the like, so that the information retrieval requirements of users and the content contained in the information objects can be effectively understood, and the search engine has the capability of understanding semantics and effective reasoning.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an intelligent searching method based on natural language processing, and solves the problems of low searching speed, low convenience and low accuracy in the prior art.
In order to achieve the purpose, the invention is realized by the following technical scheme: an intelligent search method based on natural language processing comprises the following steps:
1. information is acquired. Data gathering is performed through public channels.
2. And analyzing the stored data. The method comprises the steps that crawled data are analyzed into a proper format through an analyzer and stored in a database, wherein the analyzer comprises a type analyzer and a format analyzer, and the format analyzer is used for analyzing complex data types and formats into a uniform format;
3. and integrating the analysis data. Further integrating and analyzing the data through algorithms, wherein the algorithms comprise a data deduplication algorithm, an information extraction algorithm and an information classification algorithm;
4. and establishing an entity knowledge base. By using the technologies of Chinese word segmentation, part of speech tagging, identification tagging, rule matching and the like, paragraph/sentence level structural analysis is carried out on information, and entities and relations are extracted. Then, establishing an entity knowledge base through a word vector model and through the steps of inverted indexing, keyword optimization, similarity ranking, entity relationship matching and the like;
5. and returning a retrieval result according to a query engine in the intelligent search engine.
The intelligent search method adopts an intelligent search engine, and the intelligent search engine comprises a semantic analysis engine and a query engine; the semantic analysis engine: the query entity, the known attribute conditions and the variable attributes are analyzed, and the relationship among the query entity, the known attribute conditions and the variable attributes is analyzed, and the logical reasoning of entity association can be carried out, so that the structured semantic representation capable of accurately describing the query intention of the user is generated.
The query engine: and automatically generating and executing a corresponding database query Statement (SQL) according to the result of the semantic analysis engine, and returning a query result.
The invention has the following beneficial effects:
1. the technical advancement is as follows: a semantic Parser of M-Parser based on the self-research of the character; the query semantics can be accurately and rapidly understood;
2. high expandability: automatically configuring the system through structured data; "zero training corpus" cold start: data do not need to be marked;
3. high maintainability: problem templates do not need to be manually constructed.
Drawings
The invention is described in detail below with reference to the drawings and the detailed description;
FIG. 1 is a diagram of the system infrastructure of the present invention;
FIG. 2 is a schematic diagram of a semantic analyzer MPar according to the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
Referring to fig. 1-2, the following technical solutions are adopted in the present embodiment: an intelligent search method based on natural language processing comprises the following steps:
1. information is acquired. Data gathering is performed through public channels.
2. And analyzing the stored data. The method comprises the steps that crawled data are analyzed into a proper format through an analyzer and stored in a database, wherein the analyzer comprises a type analyzer and a format analyzer, and the format analyzer is used for analyzing complex data types and formats into a uniform format;
3. and integrating the analysis data. Further integrating and analyzing the data through algorithms, wherein the algorithms comprise a data deduplication algorithm, an information extraction algorithm and an information classification algorithm;
4. and establishing an entity knowledge base. By using the technologies of Chinese word segmentation, part of speech tagging, identification tagging, rule matching and the like, paragraph/sentence level structural analysis is carried out on information, and entities and relations are extracted. Then, establishing an entity knowledge base through a word vector model and through the steps of inverted indexing, keyword optimization, similarity ranking, entity relationship matching and the like;
5. and returning a retrieval result according to a query engine in the intelligent search engine.
The intelligent search method adopts an intelligent search engine, and the intelligent search engine comprises a semantic analysis engine and a query engine.
The semantic analysis engine disassembles the specific problem by simulating the thinking process of a human to the problem, and finally generates a semantic graph (semantic graph). The semantic analyzer utilizes the named entities and the relationships among the entities identified by the preposed module, constructs and queries the main body in the statement based on a knowledge graph or a database, analyzes the constraint condition of the problem main body through the steps of problem segmentation, constraint condition analysis and the like, finally generates a semantic graph and sends the semantic graph to a query engine. Since it is the query intent understanding that is addressed by this detailed description, its abstract semantics can be represented by entities, attribute elements, and their relationships:
1. the focus entity is as follows: information of what type of entity is to be looked up. Such as querying company basic information or financing coupon information.
2. The attributes to be screened: the screening condition expressed in the question is for which attribute of which entity, and the constraint condition of the attribute value. For example, the attribute "financing bid amount" is satisfied by a value greater than a certain value, such as 1 billion.
3. Target attribute of query: for the focus entity, it needs to query what kind of information it has, such as "total market value" which is an information point.
In addition, there may be inferred components in the complex query semantics due to the associative relationships between entity attributes.
The semantic analysis engine in the present embodiment directly adopts the semantic analyzer MPar of the context interconnection self-research. The method comprises the steps of gradually analyzing attribute values, known attributes, variable attributes and entities through an entity, attribute and attribute value model established by a data pre-modeling module, judging possible entity association and finally generating a semantic graph. The semantic analyzer is driven by data modeling, and different data models can be generated according to the accessed data in different fields, so that different entities are identified and possible relationships are matched. Finally, disambiguation is carried out by a high-abstraction disambiguation module.
The module receives a semantic graph output by a semantic analysis engine, converts semantic analysis results into refined sub-query relations, executes corresponding queries through different types of query components, and finally returns query results after engine checking validity and unified processing. The query combines the result of semantic analysis, the query logic of the service and the data analysis of the knowledge base to complete the reasonable planning of the query. The integration process is divided into two steps, the first step is to generate an abstract query structure, which is independent of the underlying data storage manner. And the second step is to generate an actual query statement which can be operated, and operate to obtain a query result.
One important reason for the two steps is that the specific query statement formats and generation logics are different for different data sources and query modes, such as structured queries executed by SQL statements of relational database queries, for example, text searches for a column in a database (e.g., PostgreSQL) that can store objects, and search queries of Elasticsearch. After the two steps are separated, the actual function of the first step is the common abstraction of different actual query operations in the second step, so that the purposes of high multiplexing and flexible extension when the data source is to be extended to a new data source are achieved. These two steps are respectively realized by two major core sub-modules of the query engine:
1. the Query Builder converts the semantic graph into an abstract Query structure. QueryBuilder is essentially a kind of middleware designed to implement efficient queries.
2. And the Query Resolver generates a Query statement which can be executed finally by the Query structure, executes the statement and returns a result. The system may have multiple Query resolvers, which are introduced in this section to generate a Resolver (also referred to as a table Resolver) of an SQL Query statement on a general relational database. In system implementation, if a common text search is to be supported, a text resolver can be implemented to perform string-based retrieval on text in some columns of data.
The data presented by the knowledge graph part in the system requirements is also completed by the query engine. With the knowledge map as an aid, the question-answering system can insights semantic information behind user query, return more accurate and efficient information, and can meet the query requirements and question-answering effects of users to a greater extent. Since the input data of the project is mostly structured data, the knowledge graph work is embodied in the following two aspects:
1. relation connection: information in the different structured data is linked to the corresponding listed company in a company code and in short.
2. Map inquiring and displaying: and inquiring related basic information, supervision information and transaction information in a database through company codes and short names.
The semantic analyzer MPar of the present embodiment has the following advantages:
1 self-learning from structured data: modeling by utilizing semantic matching constraint contained in the map and the table;
eliminating a large amount of pseudo ambiguity;
massive prediction training is not needed;
2, native support reasoning and cross-table query;
3, key parameters can be configured and optimized;
4, analysis flow: relatively simple classification or matching of semantic parsing decomposition;
mpar analysis procedure:
attribute-attribute value matching: matching (attaching) attribute value nodes such as money amount, name and the like to the optimal attribute node (such as market price total value and name);
attribute classification and reasoning: judging which attributes are known attribute nodes (conditions for screening); which are variable attribute nodes (targets of queries); reasoning and breaking the virtual attribute nodes;
constructing a semantic graph: constructing a semantic graph of a single entity; searching a semantic fragment path; finding out the optimal semantic fragment combination capable of covering the original question sentence; sequencing a plurality of nodes corresponding to the semantic fragments according to the characteristic weight of the nodes;
and (3) entity relationship reasoning: constructing a cross-entity semantic graph according to the attribute association of the cross-entity;
and the 5Query rewriting module supports automatic rewriting and re-analyzing of unidentified information in the question based on the technologies such as BERT and the like, and understands the daily fuzzy expression.
6. Support term configuration background: the background graphical interface is flexibly configured, takes effect immediately and assists in reading the jargon.
The query engine of the specific embodiment has the advantages that:
1. query Resolver supports automatic generation of SQL code.
A. The SQL code is generated by adopting an Object-relational mapping (ORM) mode, the complex bottom operation in the SQL sentence generation process is automatically realized, and only other parts of the system are required to provide database query expressed by using an Object, a function and the like.
B. According to different bottom databases, the SQL dialect conversion is automatically completed.
2. Asynchronous (query) execution flow, asynchronous query avoids blocking the processing flow of the whole system.
A. The front end performs paging display, and the back end rapidly responds to a visible paging result;
B. the back end continuously and asynchronously inquires, and cache the unread paging result;
C. the memory type cache has excellent read-write performance and expandability.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (2)

1. An intelligent search method based on natural language processing is characterized by comprising the following steps:
(1) and acquiring information: data collection is carried out through an open channel;
(2) analyzing the stored data: the method comprises the steps that crawled data are analyzed into a proper format through an analyzer and stored in a database, wherein the analyzer comprises a type analyzer and a format analyzer, and the format analyzer is used for analyzing complex data types and formats into a uniform format;
(3) integrating analysis data: further integrating and analyzing the data through algorithms, wherein the algorithms comprise a data deduplication algorithm, an information extraction algorithm and an information classification algorithm;
(4) establishing an entity knowledge base: performing paragraph/sentence level structural analysis on information by using the technologies of Chinese word segmentation, part of speech tagging, identification tagging, rule matching and the like, and extracting entities and relations; then, establishing an entity knowledge base through a word vector model and through the steps of inverted indexing, keyword optimization, similarity ranking, entity relationship matching and the like;
(5) and returning a retrieval result according to a query engine in the intelligent search engine.
2. The intelligent search method based on natural language processing according to claim 1, wherein the intelligent search method employs an intelligent search engine, the intelligent search engine comprising a semantic analysis engine and a query engine; the semantic analysis engine: analyzing query entities, known attribute conditions and variable attributes and the relationship among the query entities, the known attribute conditions and the variable attributes, and performing logical reasoning on entity association so as to generate a structured semantic representation capable of accurately describing the query intention of a user;
the query engine: and automatically generating and executing a corresponding database query Statement (SQL) according to the result of the semantic analysis engine, and returning a query result.
CN202110917325.6A 2021-08-11 2021-08-11 Intelligent search method based on natural language processing Pending CN113849596A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110917325.6A CN113849596A (en) 2021-08-11 2021-08-11 Intelligent search method based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110917325.6A CN113849596A (en) 2021-08-11 2021-08-11 Intelligent search method based on natural language processing

Publications (1)

Publication Number Publication Date
CN113849596A true CN113849596A (en) 2021-12-28

Family

ID=78975755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110917325.6A Pending CN113849596A (en) 2021-08-11 2021-08-11 Intelligent search method based on natural language processing

Country Status (1)

Country Link
CN (1) CN113849596A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925118A (en) * 2022-06-09 2022-08-19 北京百度网讯科技有限公司 Cross-table search method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925118A (en) * 2022-06-09 2022-08-19 北京百度网讯科技有限公司 Cross-table search method, device, equipment and storage medium
CN114925118B (en) * 2022-06-09 2023-05-16 北京百度网讯科技有限公司 Cross-table searching method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN102087669B (en) Intelligent search engine system based on semantic association
US20160041986A1 (en) Smart Search Engine
KR101646754B1 (en) Apparatus and Method of Mobile Semantic Search
US12007939B1 (en) Method and apparatus for determining search result demographics
KR20120073229A (en) Trusted query system and method
CN105045852A (en) Full-text search engine system for teaching resources
CN101464897A (en) Word matching and information query method and device
US11487795B2 (en) Template-based automatic software bug question and answer method
JPWO2020023787A5 (en)
CN111061828B (en) Digital library knowledge retrieval method and device
CN110674229A (en) AST-based relational database SQL table relational analysis and display method
CN101393565A (en) Facing virtual museum searching method based on noumenon
CN102253930A (en) Method and device for translating text
CN112883165B (en) Intelligent full-text retrieval method and system based on semantic understanding
CN102662936A (en) Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning
CN112036178A (en) Distribution network entity related semantic search method
US20050065920A1 (en) System and method for similarity searching based on synonym groups
CN113849596A (en) Intelligent search method based on natural language processing
CN112183110A (en) Artificial intelligence data application system and application method based on data center
Мезенцева et al. Optimization of analysis and minimization of information losses in text mining
CN111709239A (en) Geoscience data discovery method based on expert logic structure tree
CN114297350B (en) Urban domain knowledge model query method and device oriented to natural language
CN115827829B (en) Ontology-based search intention optimization method and system
Patil et al. Database keyword search: a perspective from optimization
Gao Integration, Provenance, and Temporal Queries for Large-Scale Knowledge Bases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination