CN116484008A - Intelligent question recommendation method based on knowledge graph and vector search engine - Google Patents

Intelligent question recommendation method based on knowledge graph and vector search engine Download PDF

Info

Publication number
CN116484008A
CN116484008A CN202211698717.9A CN202211698717A CN116484008A CN 116484008 A CN116484008 A CN 116484008A CN 202211698717 A CN202211698717 A CN 202211698717A CN 116484008 A CN116484008 A CN 116484008A
Authority
CN
China
Prior art keywords
entity
query
vector
similarity
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211698717.9A
Other languages
Chinese (zh)
Inventor
张博林
侯国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhixue Huijiao Hubei Education Technology Co ltd
Original Assignee
Zhixue Huijiao Hubei Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhixue Huijiao Hubei Education Technology Co ltd filed Critical Zhixue Huijiao Hubei Education Technology Co ltd
Priority to CN202211698717.9A priority Critical patent/CN116484008A/en
Publication of CN116484008A publication Critical patent/CN116484008A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and discloses an intelligent question-recommending method based on a knowledge graph and a vector search engine, which comprises the following working steps: the first step: performing Query pretreatment; and a second step of: the method comprises the steps of (1) identifying a Mention; and a third step of: entity linking; fourth step: block searching; fifth step: question recommendation. According to the invention, a bridge between the Query and the problem is built by utilizing the knowledge graph, the Partition is built according to the entity by utilizing the Milvus vector search engine, the entity obtained by linking the entities is searched for vector similarity in the corresponding Partition, and the search speed is further improved. The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.

Description

Intelligent question recommendation method based on knowledge graph and vector search engine
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an intelligent question-recommending method based on a knowledge graph and a vector search engine.
Background
The current question-referring method is mostly based on keywords, but the accuracy is to be considered. In addition, there is a method for calculating the vector similarity, but when the problem amounts in the problem base are accumulated to a certain degree, the calculation time is required to be considered.
The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.
Therefore, we propose an intelligent question-referring method based on a knowledge graph and a vector search engine.
Disclosure of Invention
The invention mainly solves the technical problems existing in the prior art and provides an intelligent question-recommending method based on a knowledge graph and a vector search engine.
In order to achieve the purpose, the invention adopts the following technical scheme that the intelligent question-recommending method based on the knowledge graph and the vector search engine comprises the following working steps:
the first step: performing Query pretreatment;
and a second step of: the method comprises the steps of (1) identifying a Mention;
and a third step of: entity linking;
fourth step: block searching;
fifth step: question recommendation.
Preferably, the Query preprocessing in the first step includes: the user inputs the Query, and may have non-standard content including full-angle characters, traditional characters, invalid characters and the like, so that the Query needs to be preprocessed;
the preprocessing operation of Query comprises full-half angle conversion, complex conversion and invalid character removal, and in addition, in order to keep consistency with a Mention-Entity dictionary, case-case conversion operation is carried out.
Preferably, the identity of the second step includes: identifying all references in the preprocessed Query by using a Mention-Entity dictionary;
wherein, the Mention recognition uses an AC automaton to load a Mention-Entity dictionary, and carries out Mention search on the preprocessed Query.
Preferably, the entity linking in the third step includes: acquiring and referring to related entities according to the identified Mentions and the Mentions-Entity dictionary to obtain a candidate Entity set, calculating the similarity between all entities in the candidate Entity set and the Query in order to acquire the Query related entities, and taking the Entity with the similarity score higher than a threshold value as the Query related Entity;
the similarity measurement method uses cosine similarity to convert Query and candidate entities into vectors, and calculates cosine distance between the two vectors;
the method comprises the following steps of: 1) Word segmentation is carried out on the Query by using jieba; 2) Removing stop words; 3) Vector transformation;
the vector conversion method of the Query and the entity to be selected is consistent with that of the topic pretreatment;
the similarity score threshold takes 0.95, and in order to avoid no return phenomenon, when no entity returns, the entity with score topk is taken according to the reference number k (maximum matching) to return.
Preferably, the fourth step of block searching includes: according to the Query related entity, loading the Partition corresponding to the entity in the Collection, performing similarity search in the Partition, and returning the question id, the similarity score of the question and the Query from high to low according to the similarity score;
the Partition is obtained from a preprocessing stage, and the Partition and the entity are in one-to-one correspondence, so that the related topic vectors of the entity are saved;
wherein, similarity search parameter sets: 'index_type' = 'ivf_flag', 'metric_type' = 'IP', 'nlist' = 20, 'nprobe' = 3, respectively represent a vector index establishment method, a distance measurement index, the number of clusters, and searching in the previous nprobe clusters during similarity searching.
Preferably, the topic recommendation of the fifth step further includes: and according to the result [ topic id, similarity score ] obtained by the block search, and according to a predefined threshold, topics higher than the threshold score are taken as Query related topics. To ensure the exposure rate of the title, the relevant title is returned randomly.
Advantageous effects
The invention provides an intelligent question-recommending method based on a knowledge graph and a vector search engine. The device comprises the following
The beneficial effects are that:
(1) According to the intelligent question-recommending method based on the knowledge graph and the vector search engine, a bridge between Query and problems is built by using the knowledge graph, a Partition is built according to the entity by using the Milvus vector search engine, the entity obtained by entity link is searched for vector similarity in the corresponding Partition, and the searching speed is further improved. The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.
Drawings
FIG. 1 is a flow chart of an intelligent question-recommending method based on a knowledge graph and a vector search engine.
Detailed Description
An intelligent question recommendation method based on a knowledge graph and a vector search engine is shown in figure 1.
Pretreatment of
1. Menton-Entity dictionary creation
And establishing a Mention-Entity dictionary based on the knowledge graph. The Mention is defined as a language fragment of a natural language text that expresses an entity, including the name, alias, or word indirectly referring to the entity by expressing an important attribute of the entity.
The establishment flow of the Mention-Entity dictionary is as follows: traversing a map triplet (tail), and if the relation is a Chinese name, a foreign name, an alias, an abbreviation, a synonym, a phonetic notation, the tail is used as a key, and the head is used as a value to be added into a dictionary; and mining keywords in the entity important attribute expression by using a pre-training model provided by Paddlenlp and a jieba library to obtain a set key, and adding the set element as a key and the head as a value into a dictionary.
The pre-training model provided by Paddlenlp comprises a named entity recognition model and a part-of-speech analysis model.
Wherein the mining of keywords is implemented by manually defined rules.
Wherein the manually defined rules include rule 1: 'sensory characteristics|term class ]' means that if the entity class of token is 'sensory characteristics' or 'term class', it is added to the collection key.
Wherein the manually defined rules include rule 2: the term "modifier" |vocabulary term "means that if the entity class of the token preceding the token whose entity class is the" sensory feature "is the" modifier "or the" vocabulary term ", the two tokens are combined into one word and added to the collection key.
Wherein the manually defined rules include rule 3: the term class {2}' indicates that if the entity class of the token preceding the token whose entity class is the term class, the two tokens are combined into one word and added into the set key.
Wherein the manually defined rules include rule 4: the jieba is used for extracting keywords with 20 bits before the weight, the part of speech of the keywords is analyzed, and if the part of speech of the keywords is noun, proper noun or proper noun, the keywords are added into a set key.
Wherein the manually defined rules include rule 5: when the part of speech of the word is a noun, a proper noun, and the word contains only chinese characters or only english characters, the word is added to the set keys.
2. Question preprocessing
The pretreatment of the questions comprises the steps of establishing a question Id-question vector dictionary, establishing a question Id-question difficulty dictionary, establishing an Entity-question Id dictionary and updating a Entity-Entity dictionary.
The title features comprise unique title Id, title content, title problem, title difficulty and title analysis.
The construction of the topic Id-topic vector dictionary is to take the topic unique Id as a dictionary key and convert topic contents into vectors to be taken as dictionary values to be added into the dictionary, and the specific vector conversion method is as follows: the prestrained word vector was loaded using a Gensim library: vector conversion takes token as a basic unit, and if token exists in the pre-training word vector, a corresponding vector is obtained; if the token does not exist, acquiring all character vectors of the token by characters, and taking the average as the vector of the token; finally, all token vectors are averaged as a result vector.
The title Id-question dictionary is established by adding the unique title Id as a dictionary key and the question type as a dictionary value into the dictionary. The number of questions is five, including single choice questions, multiple choice questions, blank filling questions, simple answering questions and comprehensive questions, and the corresponding question type of each question is unique.
The establishment of the topic Id-topic difficulty dictionary is to add the topic unique Id as a dictionary key and the topic difficulty as a dictionary value into the dictionary. The question difficulty is of a numerical value type, and the value range is 1 to 100.
The establishment of the Entity-topic Id dictionary is to add the Entity contained in the topic content as a dictionary key and the topic unique Id as a dictionary value into the dictionary.
The update of the Mention-Entity dictionary is to add the relation between the Entity, the key word and the effective word in the topic content and the topic Entity, and the extraction method of the Entity, the key word and the effective word in the topic content is consistent with the extraction method of the descriptive text of the tail Entity in the establishment process of the Mention-Entity dictionary.
3. Partition preloading
The similarity calculation speed is increased by using a vector search engine Milvus. Defining Collection, dividing a problem base into a plurality of fractions according to the activity, wherein the fractions are in one-to-one correspondence, and each fraction stores a related problem vector of an Entity according to the Id-problem vector.
Question recommendation process
1. Query pretreatment
The user may input non-canonical content in the Query, including full-angle characters, traditional characters, invalid characters, etc., so that preprocessing operations need to be performed on the Query.
The preprocessing operation of Query comprises full-half angle conversion, complex conversion and invalid character removal, and in addition, in order to keep consistency with a Mention-Entity dictionary, case-case conversion operation is carried out.
2. Mention identification
All references in the preprocessed Query are identified using the Mention-Entity dictionary. Wherein, the Mention recognition uses an AC automaton to load a Mention-Entity dictionary, and carries out Mention search on the preprocessed Query.
3. Entity linking
Acquiring and referring to related entities according to the identified Mentions and the Mentions-Entity dictionary to obtain a candidate Entity set, and calculating the similarity between all entities in the candidate Entity set and the Query in order to acquire the Query related entities, and taking the Entity with the similarity score higher than a threshold value as the Query related Entity.
The similarity measurement method uses cosine similarity to convert Query and candidate entities into vectors, and calculates cosine distance between the two vectors.
The method comprises the following steps of: 1) Word segmentation is carried out on the Query by using jieba; 2) Removing stop words; 3) Vector conversion.
The vector conversion of the Query and the entity to be selected is consistent with the vector conversion method in the topic preprocessing. The similarity score threshold takes 0.95, and in order to avoid no return phenomenon, when no entity returns, the entity with score topk is taken according to the reference number k (maximum matching) to return.
4. Block search
And loading the Partition corresponding to the entity in the Collection according to the Query related entity, searching for similarity in the Partition, and returning the question id, the similarity score of the question and the Query from high to low according to the similarity score.
The Partition is obtained from a preprocessing stage, and the Partition and the entity are in one-to-one correspondence, so that the related topic vectors of the entity are saved.
Wherein, similarity search parameter sets: 'index_type' = 'ivf_flag', 'metric_type' = 'IP', 'nlist' = 20, 'nprobe' = 3, respectively represent a vector index establishment method, a distance measurement index, the number of clusters, and searching in the previous nprobe clusters during similarity searching.
5. Question recommendation
And according to the result [ topic id, similarity score ] obtained by the block search, and according to a predefined threshold, topics higher than the threshold score are taken as Query related topics. To ensure the exposure rate of the title, the relevant title is returned randomly.
According to the invention, a bridge between the Query and the problem is built by utilizing the knowledge graph, the Partition is built according to the entity by utilizing the Milvus vector search engine, the entity obtained by linking the entities is searched for vector similarity in the corresponding Partition, and the search speed is further improved. The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. The intelligent question recommendation method based on the knowledge graph and the vector search engine is characterized by comprising the following steps of: the method comprises the following working steps:
the first step: performing Query pretreatment;
and a second step of: the method comprises the steps of (1) identifying a Mention;
and a third step of: entity linking;
fourth step: block searching;
fifth step: question recommendation.
2. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the first-step Query preprocessing comprises the following steps: the user inputs the Query, and may have non-standard content including full-angle characters, traditional characters, invalid characters and the like, so that the Query needs to be preprocessed;
the preprocessing operation of Query comprises full-half angle conversion, complex conversion and invalid character removal, and in addition, in order to keep consistency with a Mention-Entity dictionary, case-case conversion operation is carried out.
3. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the said second-step Mention recognition includes: identifying all references in the preprocessed Query by using a Mention-Entity dictionary;
wherein, the Mention recognition uses an AC automaton to load a Mention-Entity dictionary, and carries out Mention search on the preprocessed Query.
4. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the entity linking of the third step includes: acquiring and referring to related entities according to the identified Mentions and the Mentions-Entity dictionary to obtain a candidate Entity set, calculating the similarity between all entities in the candidate Entity set and the Query in order to acquire the Query related entities, and taking the Entity with the similarity score higher than a threshold value as the Query related Entity;
the similarity measurement method uses cosine similarity to convert Query and candidate entities into vectors, and calculates cosine distance between the two vectors;
the method comprises the following steps of: 1) Word segmentation is carried out on the Query by using jieba; 2) Removing stop words; 3) Vector transformation;
the vector conversion method of the Query and the entity to be selected is consistent with that of the topic pretreatment;
the similarity score threshold takes 0.95, and in order to avoid no return phenomenon, when no entity returns, the entity with score topk is taken according to the reference number k (maximum matching) to return.
5. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the fourth step of block searching includes: according to the Query related entity, loading the Partition corresponding to the entity in the Collection, performing similarity search in the Partition, and returning the question id, the similarity score of the question and the Query from high to low according to the similarity score;
the Partition is obtained from a preprocessing stage, and the Partition and the entity are in one-to-one correspondence, so that the related topic vectors of the entity are saved;
wherein, similarity search parameter sets: 'index_type' = 'ivf_flag', 'metric_type' = 'IP', 'nlist' = 20, 'nprobe' = 3, respectively represent a vector index establishment method, a distance measurement index, the number of clusters, and searching in the previous nprobe clusters during similarity searching.
6. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the title recommendation of the fifth step further includes: and according to the result [ topic id, similarity score ] obtained by the block search, and according to a predefined threshold, topics higher than the threshold score are taken as Query related topics. To ensure the exposure rate of the title, the relevant title is returned randomly.
CN202211698717.9A 2022-12-28 2022-12-28 Intelligent question recommendation method based on knowledge graph and vector search engine Pending CN116484008A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211698717.9A CN116484008A (en) 2022-12-28 2022-12-28 Intelligent question recommendation method based on knowledge graph and vector search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211698717.9A CN116484008A (en) 2022-12-28 2022-12-28 Intelligent question recommendation method based on knowledge graph and vector search engine

Publications (1)

Publication Number Publication Date
CN116484008A true CN116484008A (en) 2023-07-25

Family

ID=87216082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211698717.9A Pending CN116484008A (en) 2022-12-28 2022-12-28 Intelligent question recommendation method based on knowledge graph and vector search engine

Country Status (1)

Country Link
CN (1) CN116484008A (en)

Similar Documents

Publication Publication Date Title
CN112069298B (en) Man-machine interaction method, device and medium based on semantic web and intention recognition
CN107451126B (en) Method and system for screening similar meaning words
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
CN108304372B (en) Entity extraction method and device, computer equipment and storage medium
CN110377715A (en) Reasoning type accurate intelligent answering method based on legal knowledge map
CN110737758A (en) Method and apparatus for generating a model
CN113761890B (en) Multi-level semantic information retrieval method based on BERT context awareness
CN109271524B (en) Entity Linking Method in Knowledge Base Question Answering System
CN113505209A (en) Intelligent question-answering system for automobile field
CN110895559A (en) Model training method, text processing method, device and equipment
CN109614493B (en) Text abbreviation recognition method and system based on supervision word vector
CN112925918B (en) Question-answer matching system based on disease field knowledge graph
CN110399603A (en) A kind of text-processing technical method and system based on sense-group division
CN115858750A (en) Power grid technical standard intelligent question-answering method and system based on natural language processing
CN113032541A (en) Answer extraction method based on bert and fusion sentence cluster retrieval
CN114461774A (en) Question-answering system search matching method based on semantic similarity and application thereof
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN111125299A (en) Dynamic word bank updating method based on user behavior analysis
CN115795018A (en) Multi-strategy intelligent searching question-answering method and system for power grid field
CN117743526A (en) Table question-answering method based on large language model and natural language processing
CN111538805A (en) Text information extraction method and system based on deep learning and rule engine
CN115994535A (en) Text processing method and device
CN111444720A (en) Named entity recognition method for English text
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
CN114238595A (en) Metallurgical knowledge question-answering method and system based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20230725

WD01 Invention patent application deemed withdrawn after publication