CN112487154B

CN112487154B - Intelligent search method based on natural language

Info

Publication number: CN112487154B
Application number: CN202011548364.5A
Authority: CN
Inventors: 杨光; 贺珊
Original assignee: Wuhan Fiberhome Digtal Technology Co Ltd
Current assignee: Wuhan Fiberhome Digtal Technology Co Ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2023-03-31
Anticipated expiration: 2040-12-24
Also published as: CN112487154A

Abstract

The invention provides an intelligent search method based on natural language, which arranges search intents related to services, classifies the search intents according to entities, attributes and general labels, and constructs a search intention knowledge graph in advance, wherein the method comprises the following steps: receiving a search text input by a user; identifying an original search intention in a search text, and carrying out standardized conversion on the original search intention by adopting a search intention knowledge graph to obtain a corresponding search entity as a final search intention; identifying original search conditions in a search text, carrying out standardized conversion on the original search conditions by adopting a search intention knowledge graph, and taking the converted search conditions as final search conditions; identifying a search scene in the search text through a scene identification model; generating a search action according to the final search intention, the final search condition and the search scene, and transmitting the search action into a search engine for searching; and returning the search result of the search engine to the user. The method and the device can improve the accuracy of natural language search.

Description

Intelligent search method based on natural language

Technical Field

The invention relates to the technical field of information search, in particular to an intelligent search method based on natural language, electronic equipment and a readable storage medium.

Background

Currently, most information search forms are mainly classified into two types: one is that the user is required to explicitly specify the search target and the search condition to complete the search, and although the search form can explicitly specify the search target and the search condition to accurately complete the whole search process, the switching between different search scenes is difficult; the other is a full text search based on a search sentence given by a user, and although the problem of search scene switching is avoided by means of text retrieval, certain deviation exists in recognition of search conditions and understanding of the whole search intention.

Disclosure of Invention

The invention aims to provide an intelligent searching method based on natural language, electronic equipment and a readable storage medium, which aim to solve the problems of recognition and standardized conversion of search intents, search scenes and search conditions in natural language search in the prior art. The invention is realized by the following steps:

in order to achieve the purpose, the invention provides an intelligent searching method based on natural language, which arranges the searching intents related to the service, classifies the searching intents according to entities, attributes and general labels, and constructs a searching intention knowledge map in advance;

the intelligent search method based on natural language comprises the following steps:

receiving a search text input by a user;

identifying an original search intention in the search text, and performing standardized conversion on the original search intention by adopting the search intention knowledge graph to obtain a corresponding search entity as a final search intention;

identifying original search conditions in the search text, carrying out standardized conversion on the original search conditions by adopting the search intention knowledge graph, and taking the converted search conditions as final search conditions;

identifying a search scene in the search text through a scene identification model;

generating a search action according to the final search intention, the final search condition and the search scene, and transmitting the search action into a search engine for searching;

and returning the search result of the search engine to the user.

Further, in the above intelligent search method based on natural language, the identifying an original search intention in the search text includes:

and identifying the category and the boundary of a vocabulary entity in the search text through a named entity identification model, then performing dependency syntax analysis, and extracting a main part of the search text from the dependency syntax analysis to be used as an original search intention.

Further, in the above intelligent search method based on natural language, the named entity recognition model is trained by Albert-tiny and BILSTM + CRF.

Further, in the above intelligent search method based on natural language, the UD _ chinese gsd chinese dependency syntax tree library is used to perform chinese syntax dependency analysis.

Further, in the above intelligent search method based on natural language, the identifying original search conditions in the search text includes:

and identifying the search text through a named entity identification model, and recording the attribute name and the attribute value in the search text as the original search condition of the search text.

Further, in the above intelligent search method based on natural language, the standardized conversion of the original search condition by using the search intention knowledge graph includes:

and normalizing attribute values of the identified original search conditions, and normalizing attribute names of the original search conditions by traversing the search intention knowledge graph to search for a normalized attribute field corresponding to a specific target of the search intention knowledge graph.

Further, in the above intelligent search method based on natural language, the search scenario is a combination pattern between different search conditions;

the identifying the search scenes in the search text through the scene recognition model comprises:

and identifying AND, OR, NOT and combination patterns among different search conditions in the final search condition through the scene identification model, wherein the scene identification model is a deep learning model.

Further, in the above intelligent search method based on natural language, the generating a search action according to the final search intention, the final search condition and the search scenario includes:

determining a search action template according to the search scene, determining a search data source according to the entity in the final search intention, and sequentially filling different search conditions in the final search conditions into the search action template respectively to generate search actions.

Further, in the above intelligent search method based on natural language, the search intention knowledge graph is stored in a janussgraph database.

Further, in the above intelligent search method based on natural language, the search engine is an ElasticSearch.

To achieve the above object, the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the natural language based intelligent search method described in any one of the above.

To achieve the above object, the present invention further provides a readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the natural language based intelligent search method according to any one of the above items.

Compared with the prior art, the invention has the following beneficial effects:

compared with other natural language searching methods, the method has the innovative points that the natural language processing technology and the knowledge graph technology are combined to solve the problems of recognition and standard conversion of the searching intention, the searching scene and the searching condition in the natural language searching, and the like, and can enhance the recognition capability of the searching intention and the searching scene based on less model training cost, thereby improving the accuracy of the natural language searching.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description will be briefly introduced, and it is obvious that the drawings in the following description are an embodiment of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts according to the drawings:

FIG. 1 is a flowchart of an intelligent search method based on natural language according to an embodiment of the present invention;

FIG. 2 is an overall flow diagram of the present invention;

FIG. 3a is a schematic view of a search intention knowledge graph;

FIG. 3b is a schematic view of a search intention knowledge graph with an entity of "human";

FIG. 4a is an intent recognition and translation flow diagram;

FIG. 4b is an exemplary diagram of the recognition and analysis of search text;

FIG. 5 is a flow chart of conditional recognition and normalization conversion;

FIG. 6 is a search action generation and search flow diagram.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The core idea of the invention is to provide a natural language intelligent search method combining a knowledge graph aiming at the problems in the prior art, which can be widely applied to the technical fields of security protection, finance, internet and the like, can fully understand the search intention of a user based on natural language, and automatically identify the search scene and the search condition, thereby realizing the intelligent search based on natural language. In the method, a user does not need to manually indicate the search intention, the search scene and the search condition, the original search intention and the search condition hidden in the search text are automatically identified through natural language processing technologies such as part of speech extraction, syntactic dependency, named entity identification and the like, the original search intention and the search condition are subjected to standardized conversion by utilizing a search intention knowledge map and a universal standardized conversion interface, and a corresponding search action is generated, so that the whole search process is completed, and the search result is returned to the user.

In the intelligent search method based on natural language provided in an embodiment of the present invention, search intents related to a service need to be sorted, classified according to entities, attributes, and general tags, and a search intention knowledge graph is constructed in advance.

As shown in fig. 1, an intelligent search method based on natural language provided in an embodiment of the present invention includes the following steps:

s100: search text input by a user is received.

S200: and identifying an original search intention in the search text, and performing standardized conversion on the original search intention by adopting the search intention knowledge graph to obtain a corresponding search entity as a final search intention.

S300: identifying original search conditions in the search text, carrying out standardized conversion on the original search conditions by adopting the search intention knowledge graph, and taking the converted search conditions as final search conditions;

s400: identifying a search scene in the search text through a scene identification model;

s500: and generating a search action according to the final search intention, the final search condition and the search scene, and transmitting the search action into a search engine for searching.

S600: and returning the search result of the search engine to the user.

Compared with other natural language search methods, the method has the innovation points that the natural language processing technology and the knowledge graph technology are combined to solve the problems of recognition and standard conversion of search intentions, search scenes and search conditions in natural language search, and the like, and can enhance the recognition capability of the search intentions and scenes on the basis of less model training cost, so that the accuracy of the natural language search is improved.

The following describes the technical solution for implementing the present invention in detail with reference to fig. 2 to 6:

firstly, the method for constructing the search intention knowledge graph is as follows:

the knowledge graph is selected as the search intention tree carrier mainly because the knowledge graph has natural advantages in semantic expression of natural language as a semantic network. And sorting the search intents related to the service, and classifying according to the entities, the attributes and the general tags. Suppose that there may be a search scenario for a person in a business and its original search intent as shown in table 1 below:

TABLE 1 original search intent to search for text

Serial number	Searching text	Original search intent
			1	420111190010100010	Searching according to ID card number
2	Zhang San	Searching by name attribute
			3	Male sex	Searching by gender attribute
4	People who like to eat sweet food	Searching according to personnel tags

As can be known from the original search intention, the purpose of searching for information of a person entity by a user through elements such as an identity card number, a name, a gender, a tag and the like is to search for information of the person entity, so that the original search intention to be searched needs to be linked to the entity. Therefore, the semantic link between the intention and the entity is completed by constructing the corresponding knowledge graph in the method, and the search intention knowledge graph is shown as fig. 3a and fig. 3 b. In fig. 3b, the "identification number", "name", "snacks" are the attributes and labels of the entity "person", respectively, and when the original search intention is some elements in the attributes or labels, the map can quickly realize the quick standardized conversion from the original search intention to the search entity.

In step S200, the method of intention recognition for the search text is as follows:

the method mainly identifies the original search intention in the search text through named entity identification and dependency syntax analysis. Specifically, the category and the boundary of a vocabulary entity in the search text are identified through a named entity identification model, then dependency syntax analysis is carried out, and a main part of the search text is extracted from the dependency syntax analysis and serves as an original search intention. The named entity recognition model is primarily to recognize the categories and boundaries of the vocabulary entities in the search phrase, such as the entity words included in "sweethearts" (tags) and "people" (entities) in the above example, respectively. Dependency parsing is a commonly used natural language processing method for parsing master-slave relationships between words in phrases, relying on a chinese dependency syntax tree library (a linguistic resource consisting of a large number of artificially labeled dependency syntax trees), in which the UD _ chinese gsd chinese dependency syntax tree library is preferably used. And after the dependency syntax analysis is carried out on the search phrase, extracting a main part of the search text from the search phrase to be used as an original search intention. As shown in fig. 4b, the search text "people happy with sweet food" finally recognizes "people" as the original search intention.

In step S200, the method of the search intention conversion based on the intention knowledge graph is as follows:

the original search intent identified in the search text has diversity, possibly being one of an entity, an entity attribute, an entity tag, whereas the final search intent is directed to the entity in unison. Therefore, in the face of such divergence, the method of the present invention transforms the original search intention identified by the named entity by using the knowledge graph of search intention, and obtains the corresponding search entity as the final search intention.

TABLE 2 normalized final search intent after transformation

Serial number	Searching text	Original search intent	Intention after conversion
				1	420111190010100010	Identity card number	Personal entity
2	Zhang San	Name (I)	Personal entity
				3	Male sex	Sex	Personal entity
4	People happy to eat sweet food	Human being	Personal entity

In step S300, the method for identifying the search condition specifically includes:

and identifying and extracting the search condition based on a multi-strategy fusion mode. In the method, aiming at the hidden search condition in the search text, a multi-strategy fusion mode based on a dictionary, a rule and a sequence marking model (Albert-tiny + bilstm + crf) is adopted for identification, namely, the named entity identification model is adopted for identification, and the attribute value and the attribute name of the named entity identification model are recorded as the original search condition of the search text.

Table 3 original search condition for searching text

Serial number	Searching text	Original conditions
			1	Person born in 1 month of 2020	2020 to 1 month
2	Red jeep	Red jeep
			3	Recently occurring news	More recently, the development of new and more recently developed devices

In step S300, the method of converting the search condition normalization specifically includes the following steps:

the original search condition identified by the multi-strategy fusion mode cannot be directly used for query, so that the original search condition needs to be subjected to standardized conversion according to different query scenes and query intentions. The invention carries out standardization conversion on the original search condition through the search intention knowledge map and the universal standardization conversion module. Specifically, attribute values of each identified original search condition are normalized, and attribute names of the search intention knowledge graph are normalized by traversing the search intention knowledge graph to search a body target and a corresponding normalized attribute field.

Aiming at the identified original condition, searching a normalized attribute field corresponding to the entity target of the original condition by traversing a search intention knowledge graph, wherein the normalized attribute field corresponding to the original condition of '1 month in 2020' contained in a search text 'a person born in 1 month in 2020' is 'birth date'; part of the primitive conditions still cannot be directly used for searching after the standardized attribute field is identified, so the values of the primitive conditions are also required to be standardized by a universal standardization module, for example, the attribute field of 'date of birth' is labeled to belong to a date type, and the value of the primitive conditions of '1 month 2020' cannot be directly used for searching, so the primitive conditions are required to be converted into a date range which can be used. Only the search conditions that have undergone the normalized conversion can be used for the search.

TABLE 4 list of converted criteria search criteria

Serial number	Searching text	Original search criteria	Normalizing the converted query terms
				1	Person born in 1 month of 2020	Year 2020, 1 month	2020-01-01<Date of birth<2020-01-31
2	Red jeep	Red jeep	Body color = red AND style = jeep
				3	Recent news	More recently, the development of new and more recently developed devices	Time of occurrence = most recent

In step S400, the method for identifying the search scene specifically includes:

the method comprises the steps of defining a search scene as a combination mode among different search conditions, obtaining a scene recognition model by training a corresponding deep learning model, and recognizing AND, OR, NOT and the combination mode among the different search conditions in the final search condition through the scene recognition model.

TABLE 5 search Scenario and Condition combination Pattern

In step S500, the method of generating the search query action is as follows:

and generating corresponding search actions through the search action generator according to the final search intention, the search scene and the final search condition generated in the steps, and supplying the corresponding search actions to a search engine to complete the whole search process. Specifically, a search action template is determined according to the search scene, a search data source is determined according to the entity in the final search intention, and different search conditions in the final search conditions are respectively filled in the search action template in sequence to generate a search action. In the whole search action generating process, the search generator determines a search data source according to the entity in the final search intention, determines a search action template according to a search scene, and limits a target search range according to a final search condition. For example, the final search intention, search scenario and final search condition of the search text "person born 1 month in 2020" are respectively as shown in the following table 6:

TABLE 6 Final search intention, search scenario and Final search criteria for person born in 1 month of 2020

Searching text	Person born in 1 month of 2020
		Final search intent	Personnel
Searching scenes	Conditional search
		Final search criteria	2020-01-01<Date of birth<2020-01-31

The search pseudo code generated by the search action generator is then:

SELECT FROM 'personnel'

WHERE

' birth date ' > ' 2020-01-0100.

In step S600, the method for returning the search result is as follows:

and (5) transmitting the search action in the step (S500) into a search engine for searching, sequencing corresponding search results and returning the sequenced search results to the user.

The invention will be described below by way of example using a natural language search system based on the (Albert-tiny + Bilstm + cr) named entity recognition model + JanusGraph + ElasticSearch.

In this example, albert-tiny + bilstm + crf is adopted to carry out model training on the named entity recognition model, and the trained named entity recognition model is loaded and provides named entity recognition service for the outside. JanusGraph is used as a graph database component of the search intention knowledge graph, words which are related to business and represent entities, attributes and labels are sorted, the words are linked according to the criteria of belonging and synonymy meaning, the search intention knowledge graph is constructed, the graph is stored in the JanusGraph graph database, and services such as search of search entities, intention conversion and the like are provided for the outside. The UD _ ChineseGSD Chinese dependency syntax tree library is used for Chinese syntax dependency analysis. ElasticSearch as a search engine.

The specific implementation process is as follows:

1. after the search text is subjected to dependency analysis through a Chinese dependency syntax tree, obtaining an original search intention of the search text, and converting the original search intention by utilizing a search intention knowledge graph to obtain a final search intention;

2. after the search text is subjected to named entity identification analysis, an original search condition list is obtained, and a standard search condition list which can be directly used for searching is obtained by standardizing attribute names and attribute values of the list;

3. identifying a search scene contained in the search text through a scene identification model by combining the search condition list obtained in the step 2, wherein the search scene is mainly a combination mode of each condition and is used for generating a search action template;

4. the search action generator takes the data source linked with the final search intention obtained in the step 2 as a limited data source according to the search action template generated in the step 3, sequentially fills the search conditions in the standard search condition list into the search action template respectively to generate a search action, and transmits the search action to a search engine;

5. and the search engine ElasticSearch completes the search process according to the search script in the search action obtained in the step 4, and returns the obtained search result to the user.

The related art referred to above is described below.

Albert-tiny: bert is a related model proposed by Google to generate word vectors, and Albert is an improved version of the Bert model, so that a memory can be effectively reduced, and the training speed is improved. The Albert-tiny model is a light Albert model, the hidden layer of the Albert-tiny model is only 4 layers, and the model parameter quantity is about 1.8M, so that the Albert-tiny model is very light. Compared with Bert, the training and reasoning prediction speed of the Albert-tiny robot is improved by about 10 times under the condition of basically preserving the precision.

BilSTM + CRF: the LSTM is a generic term for long-short memory models, and is well suited for modeling time-series data, such as text data. The BilSTM is formed by combining a forward LSTM and a backward LSTM. The CRF conditional random field is a serialization labeling algorithm. In natural language processing, the Bi-LSTM + CRF model is often used to solve the text classification problem.

Elastic search: the ElasticSearch is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on RESTful web interface. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine.

JanusGraph: janus graph is a highly scalable distributed graph database dedicated to storing and querying graphs containing hundreds of millions of poles and edges distributed across multiple clusters. JanusGraph is a transactional database and can support thousands of concurrent users to perform complex graph traversals in real time.

Based on the same inventive concept, the present invention further provides an electronic device, which includes a processor and a memory, wherein the memory stores a computer program, and the processor implements the intelligent search method based on natural language as described above when executing the computer program.

The processor may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor is typically used to control the overall operation of the electronic device. In this embodiment, the processor is configured to execute the program code stored in the memory or process data, such as the program code of the intelligent search method based on natural language.

The memory includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. In other embodiments, the memory may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), and the like, provided on the electronic device. Of course, the memory may also include both internal and external memory units of the electronic device. In this embodiment, the memory is generally used for storing an operating method installed in the electronic device and various types of application software, such as a program code of an intelligent search method based on natural language. Further, the memory may be used to temporarily store various types of data that have been output or are to be output.

Based on the same inventive concept, the present invention also provides a readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the intelligent search method based on natural language as described above.

In summary, the intelligent search method, the electronic device and the readable storage medium based on natural language provided by the present invention have the following advantages and positive effects: compared with other natural language searching methods, the method has the innovative points that the natural language processing technology and the knowledge graph technology are combined to solve the problems of recognition and standard conversion of the searching intention, the searching scene and the searching condition in the natural language searching, and the like, and can enhance the recognition capability of the searching intention and the searching scene based on less model training cost, thereby improving the accuracy of the natural language searching.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An intelligent search method based on natural language is characterized in that search intents related to services are sorted, classified according to entities, attributes and general tags, and a search intention knowledge graph is constructed in advance;

receiving a search text input by a user;

returning the search result of the search engine to the user;

the identifying of the original search condition in the search text comprises:

identifying the search text through a named entity identification model, and recording an attribute name and an attribute value in the search text as an original search condition of the search text;

the standardized conversion of the original search condition by adopting the search intention knowledge graph comprises the following steps:

standardizing attribute values of each identified original search condition, searching a body target of the search intention knowledge graph and a standardized attribute field corresponding to the body target by traversing the search intention knowledge graph, and standardizing attribute names in the search intention knowledge graph;

the search scene is a combination mode among different search conditions;

identifying and, or, not and combination patterns among different search conditions in the final search condition through the scene identification model, wherein the scene identification model is a deep learning model;

generating a search action according to the final search intention, the final search condition and the search scene, including:

2. The intelligent natural language-based search method of claim 1, wherein said identifying an original search intent in said search text comprises:

3. The intelligent natural language-based search method of claim 2, wherein said named entity recognition model is trained using Albert-tiny and BILSTM + CRF.

4. The intelligent natural language-based search method of claim 2, wherein the chinese syntactic dependency analysis is performed using a UD _ chinese gsd chinese dependency syntax tree library.

5. The intelligent natural language-based searching method of claim 1, wherein the search intention knowledge-graph is stored in a janussgraph database.

6. The intelligent natural language-based search method of claim 1, wherein the search engine is an elastic search.

7. An electronic device comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the method of any of claims 1 to 6.

8. A readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 6.