CN116484008A

CN116484008A - Intelligent question recommendation method based on knowledge graph and vector search engine

Info

Publication number: CN116484008A
Application number: CN202211698717.9A
Authority: CN
Inventors: 张博林; 侯国强
Original assignee: Zhixue Huijiao Hubei Education Technology Co ltd
Current assignee: Zhixue Huijiao Hubei Education Technology Co ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-07-25

Abstract

The invention relates to the technical field of artificial intelligence, and discloses an intelligent question-recommending method based on a knowledge graph and a vector search engine, which comprises the following working steps: the first step: performing Query pretreatment; and a second step of: the method comprises the steps of (1) identifying a Mention; and a third step of: entity linking; fourth step: block searching; fifth step: question recommendation. According to the invention, a bridge between the Query and the problem is built by utilizing the knowledge graph, the Partition is built according to the entity by utilizing the Milvus vector search engine, the entity obtained by linking the entities is searched for vector similarity in the corresponding Partition, and the search speed is further improved. The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.

Description

Intelligent question recommendation method based on knowledge graph and vector search engine

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an intelligent question-recommending method based on a knowledge graph and a vector search engine.

Background

The current question-referring method is mostly based on keywords, but the accuracy is to be considered. In addition, there is a method for calculating the vector similarity, but when the problem amounts in the problem base are accumulated to a certain degree, the calculation time is required to be considered.

The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.

Therefore, we propose an intelligent question-referring method based on a knowledge graph and a vector search engine.

Disclosure of Invention

The invention mainly solves the technical problems existing in the prior art and provides an intelligent question-recommending method based on a knowledge graph and a vector search engine.

In order to achieve the purpose, the invention adopts the following technical scheme that the intelligent question-recommending method based on the knowledge graph and the vector search engine comprises the following working steps:

the first step: performing Query pretreatment;

and a second step of: the method comprises the steps of (1) identifying a Mention;

and a third step of: entity linking;

fourth step: block searching;

fifth step: question recommendation.

Preferably, the Query preprocessing in the first step includes: the user inputs the Query, and may have non-standard content including full-angle characters, traditional characters, invalid characters and the like, so that the Query needs to be preprocessed;

the preprocessing operation of Query comprises full-half angle conversion, complex conversion and invalid character removal, and in addition, in order to keep consistency with a Mention-Entity dictionary, case-case conversion operation is carried out.

Preferably, the identity of the second step includes: identifying all references in the preprocessed Query by using a Mention-Entity dictionary;

wherein, the Mention recognition uses an AC automaton to load a Mention-Entity dictionary, and carries out Mention search on the preprocessed Query.

Preferably, the entity linking in the third step includes: acquiring and referring to related entities according to the identified Mentions and the Mentions-Entity dictionary to obtain a candidate Entity set, calculating the similarity between all entities in the candidate Entity set and the Query in order to acquire the Query related entities, and taking the Entity with the similarity score higher than a threshold value as the Query related Entity;

the similarity measurement method uses cosine similarity to convert Query and candidate entities into vectors, and calculates cosine distance between the two vectors;

the method comprises the following steps of: 1) Word segmentation is carried out on the Query by using jieba; 2) Removing stop words; 3) Vector transformation;

the vector conversion method of the Query and the entity to be selected is consistent with that of the topic pretreatment;

the similarity score threshold takes 0.95, and in order to avoid no return phenomenon, when no entity returns, the entity with score topk is taken according to the reference number k (maximum matching) to return.

Preferably, the fourth step of block searching includes: according to the Query related entity, loading the Partition corresponding to the entity in the Collection, performing similarity search in the Partition, and returning the question id, the similarity score of the question and the Query from high to low according to the similarity score;

the Partition is obtained from a preprocessing stage, and the Partition and the entity are in one-to-one correspondence, so that the related topic vectors of the entity are saved;

wherein, similarity search parameter sets: 'index_type' = 'ivf_flag', 'metric_type' = 'IP', 'nlist' = 20, 'nprobe' = 3, respectively represent a vector index establishment method, a distance measurement index, the number of clusters, and searching in the previous nprobe clusters during similarity searching.

Preferably, the topic recommendation of the fifth step further includes: and according to the result [ topic id, similarity score ] obtained by the block search, and according to a predefined threshold, topics higher than the threshold score are taken as Query related topics. To ensure the exposure rate of the title, the relevant title is returned randomly.

Advantageous effects

The invention provides an intelligent question-recommending method based on a knowledge graph and a vector search engine. The device comprises the following

The beneficial effects are that:

(1) According to the intelligent question-recommending method based on the knowledge graph and the vector search engine, a bridge between Query and problems is built by using the knowledge graph, a Partition is built according to the entity by using the Milvus vector search engine, the entity obtained by entity link is searched for vector similarity in the corresponding Partition, and the searching speed is further improved. The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.

Drawings

FIG. 1 is a flow chart of an intelligent question-recommending method based on a knowledge graph and a vector search engine.

Detailed Description

An intelligent question recommendation method based on a knowledge graph and a vector search engine is shown in figure 1.

Pretreatment of

1. Menton-Entity dictionary creation

And establishing a Mention-Entity dictionary based on the knowledge graph. The Mention is defined as a language fragment of a natural language text that expresses an entity, including the name, alias, or word indirectly referring to the entity by expressing an important attribute of the entity.

The establishment flow of the Mention-Entity dictionary is as follows: traversing a map triplet (tail), and if the relation is a Chinese name, a foreign name, an alias, an abbreviation, a synonym, a phonetic notation, the tail is used as a key, and the head is used as a value to be added into a dictionary; and mining keywords in the entity important attribute expression by using a pre-training model provided by Paddlenlp and a jieba library to obtain a set key, and adding the set element as a key and the head as a value into a dictionary.

The pre-training model provided by Paddlenlp comprises a named entity recognition model and a part-of-speech analysis model.

Wherein the mining of keywords is implemented by manually defined rules.

Wherein the manually defined rules include rule 1: 'sensory characteristics|term class ]' means that if the entity class of token is 'sensory characteristics' or 'term class', it is added to the collection key.

Wherein the manually defined rules include rule 2: the term "modifier" |vocabulary term "means that if the entity class of the token preceding the token whose entity class is the" sensory feature "is the" modifier "or the" vocabulary term ", the two tokens are combined into one word and added to the collection key.

Wherein the manually defined rules include rule 3: the term class {2}' indicates that if the entity class of the token preceding the token whose entity class is the term class, the two tokens are combined into one word and added into the set key.

Wherein the manually defined rules include rule 4: the jieba is used for extracting keywords with 20 bits before the weight, the part of speech of the keywords is analyzed, and if the part of speech of the keywords is noun, proper noun or proper noun, the keywords are added into a set key.

Wherein the manually defined rules include rule 5: when the part of speech of the word is a noun, a proper noun, and the word contains only chinese characters or only english characters, the word is added to the set keys.

2. Question preprocessing

The pretreatment of the questions comprises the steps of establishing a question Id-question vector dictionary, establishing a question Id-question difficulty dictionary, establishing an Entity-question Id dictionary and updating a Entity-Entity dictionary.

The title features comprise unique title Id, title content, title problem, title difficulty and title analysis.

The construction of the topic Id-topic vector dictionary is to take the topic unique Id as a dictionary key and convert topic contents into vectors to be taken as dictionary values to be added into the dictionary, and the specific vector conversion method is as follows: the prestrained word vector was loaded using a Gensim library: vector conversion takes token as a basic unit, and if token exists in the pre-training word vector, a corresponding vector is obtained; if the token does not exist, acquiring all character vectors of the token by characters, and taking the average as the vector of the token; finally, all token vectors are averaged as a result vector.

The title Id-question dictionary is established by adding the unique title Id as a dictionary key and the question type as a dictionary value into the dictionary. The number of questions is five, including single choice questions, multiple choice questions, blank filling questions, simple answering questions and comprehensive questions, and the corresponding question type of each question is unique.

The establishment of the topic Id-topic difficulty dictionary is to add the topic unique Id as a dictionary key and the topic difficulty as a dictionary value into the dictionary. The question difficulty is of a numerical value type, and the value range is 1 to 100.

The establishment of the Entity-topic Id dictionary is to add the Entity contained in the topic content as a dictionary key and the topic unique Id as a dictionary value into the dictionary.

The update of the Mention-Entity dictionary is to add the relation between the Entity, the key word and the effective word in the topic content and the topic Entity, and the extraction method of the Entity, the key word and the effective word in the topic content is consistent with the extraction method of the descriptive text of the tail Entity in the establishment process of the Mention-Entity dictionary.

3. Partition preloading

The similarity calculation speed is increased by using a vector search engine Milvus. Defining Collection, dividing a problem base into a plurality of fractions according to the activity, wherein the fractions are in one-to-one correspondence, and each fraction stores a related problem vector of an Entity according to the Id-problem vector.

Question recommendation process

1. Query pretreatment

The user may input non-canonical content in the Query, including full-angle characters, traditional characters, invalid characters, etc., so that preprocessing operations need to be performed on the Query.

2. Mention identification

All references in the preprocessed Query are identified using the Mention-Entity dictionary. Wherein, the Mention recognition uses an AC automaton to load a Mention-Entity dictionary, and carries out Mention search on the preprocessed Query.

3. Entity linking

Acquiring and referring to related entities according to the identified Mentions and the Mentions-Entity dictionary to obtain a candidate Entity set, and calculating the similarity between all entities in the candidate Entity set and the Query in order to acquire the Query related entities, and taking the Entity with the similarity score higher than a threshold value as the Query related Entity.

The similarity measurement method uses cosine similarity to convert Query and candidate entities into vectors, and calculates cosine distance between the two vectors.

The method comprises the following steps of: 1) Word segmentation is carried out on the Query by using jieba; 2) Removing stop words; 3) Vector conversion.

The vector conversion of the Query and the entity to be selected is consistent with the vector conversion method in the topic preprocessing. The similarity score threshold takes 0.95, and in order to avoid no return phenomenon, when no entity returns, the entity with score topk is taken according to the reference number k (maximum matching) to return.

4. Block search

And loading the Partition corresponding to the entity in the Collection according to the Query related entity, searching for similarity in the Partition, and returning the question id, the similarity score of the question and the Query from high to low according to the similarity score.

The Partition is obtained from a preprocessing stage, and the Partition and the entity are in one-to-one correspondence, so that the related topic vectors of the entity are saved.

5. Question recommendation

And according to the result [ topic id, similarity score ] obtained by the block search, and according to a predefined threshold, topics higher than the threshold score are taken as Query related topics. To ensure the exposure rate of the title, the relevant title is returned randomly.

According to the invention, a bridge between the Query and the problem is built by utilizing the knowledge graph, the Partition is built according to the entity by utilizing the Milvus vector search engine, the entity obtained by linking the entities is searched for vector similarity in the corresponding Partition, and the search speed is further improved. The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.

The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The intelligent question recommendation method based on the knowledge graph and the vector search engine is characterized by comprising the following steps of: the method comprises the following working steps:

the first step: performing Query pretreatment;

and a third step of: entity linking;

fourth step: block searching;

fifth step: question recommendation.

2. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the first-step Query preprocessing comprises the following steps: the user inputs the Query, and may have non-standard content including full-angle characters, traditional characters, invalid characters and the like, so that the Query needs to be preprocessed;

3. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the said second-step Mention recognition includes: identifying all references in the preprocessed Query by using a Mention-Entity dictionary;

4. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the entity linking of the third step includes: acquiring and referring to related entities according to the identified Mentions and the Mentions-Entity dictionary to obtain a candidate Entity set, calculating the similarity between all entities in the candidate Entity set and the Query in order to acquire the Query related entities, and taking the Entity with the similarity score higher than a threshold value as the Query related Entity;

5. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the fourth step of block searching includes: according to the Query related entity, loading the Partition corresponding to the entity in the Collection, performing similarity search in the Partition, and returning the question id, the similarity score of the question and the Query from high to low according to the similarity score;

6. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the title recommendation of the fifth step further includes: and according to the result [ topic id, similarity score ] obtained by the block search, and according to a predefined threshold, topics higher than the threshold score are taken as Query related topics. To ensure the exposure rate of the title, the relevant title is returned randomly.