CN116484008A - Intelligent question recommendation method based on knowledge graph and vector search engine - Google Patents
Intelligent question recommendation method based on knowledge graph and vector search engine Download PDFInfo
- Publication number
- CN116484008A CN116484008A CN202211698717.9A CN202211698717A CN116484008A CN 116484008 A CN116484008 A CN 116484008A CN 202211698717 A CN202211698717 A CN 202211698717A CN 116484008 A CN116484008 A CN 116484008A
- Authority
- CN
- China
- Prior art keywords
- entity
- query
- vector
- similarity
- knowledge graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000013598 vector Substances 0.000 title claims abstract description 62
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000005192 partition Methods 0.000 claims abstract description 19
- 230000011218 segmentation Effects 0.000 claims abstract description 7
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 238000007781 pre-processing Methods 0.000 claims description 11
- 101100481876 Danio rerio pbk gene Proteins 0.000 claims description 3
- 101100481878 Mus musculus Pbk gene Proteins 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 238000000691 measurement method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 2
- 241000157593 Milvus Species 0.000 abstract description 8
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000001953 sensory effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 239000003607 modifier Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of artificial intelligence, and discloses an intelligent question-recommending method based on a knowledge graph and a vector search engine, which comprises the following working steps: the first step: performing Query pretreatment; and a second step of: the method comprises the steps of (1) identifying a Mention; and a third step of: entity linking; fourth step: block searching; fifth step: question recommendation. According to the invention, a bridge between the Query and the problem is built by utilizing the knowledge graph, the Partition is built according to the entity by utilizing the Milvus vector search engine, the entity obtained by linking the entities is searched for vector similarity in the corresponding Partition, and the search speed is further improved. The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an intelligent question-recommending method based on a knowledge graph and a vector search engine.
Background
The current question-referring method is mostly based on keywords, but the accuracy is to be considered. In addition, there is a method for calculating the vector similarity, but when the problem amounts in the problem base are accumulated to a certain degree, the calculation time is required to be considered.
The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.
Therefore, we propose an intelligent question-referring method based on a knowledge graph and a vector search engine.
Disclosure of Invention
The invention mainly solves the technical problems existing in the prior art and provides an intelligent question-recommending method based on a knowledge graph and a vector search engine.
In order to achieve the purpose, the invention adopts the following technical scheme that the intelligent question-recommending method based on the knowledge graph and the vector search engine comprises the following working steps:
the first step: performing Query pretreatment;
and a second step of: the method comprises the steps of (1) identifying a Mention;
and a third step of: entity linking;
fourth step: block searching;
fifth step: question recommendation.
Preferably, the Query preprocessing in the first step includes: the user inputs the Query, and may have non-standard content including full-angle characters, traditional characters, invalid characters and the like, so that the Query needs to be preprocessed;
the preprocessing operation of Query comprises full-half angle conversion, complex conversion and invalid character removal, and in addition, in order to keep consistency with a Mention-Entity dictionary, case-case conversion operation is carried out.
Preferably, the identity of the second step includes: identifying all references in the preprocessed Query by using a Mention-Entity dictionary;
wherein, the Mention recognition uses an AC automaton to load a Mention-Entity dictionary, and carries out Mention search on the preprocessed Query.
Preferably, the entity linking in the third step includes: acquiring and referring to related entities according to the identified Mentions and the Mentions-Entity dictionary to obtain a candidate Entity set, calculating the similarity between all entities in the candidate Entity set and the Query in order to acquire the Query related entities, and taking the Entity with the similarity score higher than a threshold value as the Query related Entity;
the similarity measurement method uses cosine similarity to convert Query and candidate entities into vectors, and calculates cosine distance between the two vectors;
the method comprises the following steps of: 1) Word segmentation is carried out on the Query by using jieba; 2) Removing stop words; 3) Vector transformation;
the vector conversion method of the Query and the entity to be selected is consistent with that of the topic pretreatment;
the similarity score threshold takes 0.95, and in order to avoid no return phenomenon, when no entity returns, the entity with score topk is taken according to the reference number k (maximum matching) to return.
Preferably, the fourth step of block searching includes: according to the Query related entity, loading the Partition corresponding to the entity in the Collection, performing similarity search in the Partition, and returning the question id, the similarity score of the question and the Query from high to low according to the similarity score;
the Partition is obtained from a preprocessing stage, and the Partition and the entity are in one-to-one correspondence, so that the related topic vectors of the entity are saved;
wherein, similarity search parameter sets: 'index_type' = 'ivf_flag', 'metric_type' = 'IP', 'nlist' = 20, 'nprobe' = 3, respectively represent a vector index establishment method, a distance measurement index, the number of clusters, and searching in the previous nprobe clusters during similarity searching.
Preferably, the topic recommendation of the fifth step further includes: and according to the result [ topic id, similarity score ] obtained by the block search, and according to a predefined threshold, topics higher than the threshold score are taken as Query related topics. To ensure the exposure rate of the title, the relevant title is returned randomly.
Advantageous effects
The invention provides an intelligent question-recommending method based on a knowledge graph and a vector search engine. The device comprises the following
The beneficial effects are that:
(1) According to the intelligent question-recommending method based on the knowledge graph and the vector search engine, a bridge between Query and problems is built by using the knowledge graph, a Partition is built according to the entity by using the Milvus vector search engine, the entity obtained by entity link is searched for vector similarity in the corresponding Partition, and the searching speed is further improved. The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.
Drawings
FIG. 1 is a flow chart of an intelligent question-recommending method based on a knowledge graph and a vector search engine.
Detailed Description
An intelligent question recommendation method based on a knowledge graph and a vector search engine is shown in figure 1.
Pretreatment of
1. Menton-Entity dictionary creation
And establishing a Mention-Entity dictionary based on the knowledge graph. The Mention is defined as a language fragment of a natural language text that expresses an entity, including the name, alias, or word indirectly referring to the entity by expressing an important attribute of the entity.
The establishment flow of the Mention-Entity dictionary is as follows: traversing a map triplet (tail), and if the relation is a Chinese name, a foreign name, an alias, an abbreviation, a synonym, a phonetic notation, the tail is used as a key, and the head is used as a value to be added into a dictionary; and mining keywords in the entity important attribute expression by using a pre-training model provided by Paddlenlp and a jieba library to obtain a set key, and adding the set element as a key and the head as a value into a dictionary.
The pre-training model provided by Paddlenlp comprises a named entity recognition model and a part-of-speech analysis model.
Wherein the mining of keywords is implemented by manually defined rules.
Wherein the manually defined rules include rule 1: 'sensory characteristics|term class ]' means that if the entity class of token is 'sensory characteristics' or 'term class', it is added to the collection key.
Wherein the manually defined rules include rule 2: the term "modifier" |vocabulary term "means that if the entity class of the token preceding the token whose entity class is the" sensory feature "is the" modifier "or the" vocabulary term ", the two tokens are combined into one word and added to the collection key.
Wherein the manually defined rules include rule 3: the term class {2}' indicates that if the entity class of the token preceding the token whose entity class is the term class, the two tokens are combined into one word and added into the set key.
Wherein the manually defined rules include rule 4: the jieba is used for extracting keywords with 20 bits before the weight, the part of speech of the keywords is analyzed, and if the part of speech of the keywords is noun, proper noun or proper noun, the keywords are added into a set key.
Wherein the manually defined rules include rule 5: when the part of speech of the word is a noun, a proper noun, and the word contains only chinese characters or only english characters, the word is added to the set keys.
2. Question preprocessing
The pretreatment of the questions comprises the steps of establishing a question Id-question vector dictionary, establishing a question Id-question difficulty dictionary, establishing an Entity-question Id dictionary and updating a Entity-Entity dictionary.
The title features comprise unique title Id, title content, title problem, title difficulty and title analysis.
The construction of the topic Id-topic vector dictionary is to take the topic unique Id as a dictionary key and convert topic contents into vectors to be taken as dictionary values to be added into the dictionary, and the specific vector conversion method is as follows: the prestrained word vector was loaded using a Gensim library: vector conversion takes token as a basic unit, and if token exists in the pre-training word vector, a corresponding vector is obtained; if the token does not exist, acquiring all character vectors of the token by characters, and taking the average as the vector of the token; finally, all token vectors are averaged as a result vector.
The title Id-question dictionary is established by adding the unique title Id as a dictionary key and the question type as a dictionary value into the dictionary. The number of questions is five, including single choice questions, multiple choice questions, blank filling questions, simple answering questions and comprehensive questions, and the corresponding question type of each question is unique.
The establishment of the topic Id-topic difficulty dictionary is to add the topic unique Id as a dictionary key and the topic difficulty as a dictionary value into the dictionary. The question difficulty is of a numerical value type, and the value range is 1 to 100.
The establishment of the Entity-topic Id dictionary is to add the Entity contained in the topic content as a dictionary key and the topic unique Id as a dictionary value into the dictionary.
The update of the Mention-Entity dictionary is to add the relation between the Entity, the key word and the effective word in the topic content and the topic Entity, and the extraction method of the Entity, the key word and the effective word in the topic content is consistent with the extraction method of the descriptive text of the tail Entity in the establishment process of the Mention-Entity dictionary.
3. Partition preloading
The similarity calculation speed is increased by using a vector search engine Milvus. Defining Collection, dividing a problem base into a plurality of fractions according to the activity, wherein the fractions are in one-to-one correspondence, and each fraction stores a related problem vector of an Entity according to the Id-problem vector.
Question recommendation process
1. Query pretreatment
The user may input non-canonical content in the Query, including full-angle characters, traditional characters, invalid characters, etc., so that preprocessing operations need to be performed on the Query.
The preprocessing operation of Query comprises full-half angle conversion, complex conversion and invalid character removal, and in addition, in order to keep consistency with a Mention-Entity dictionary, case-case conversion operation is carried out.
2. Mention identification
All references in the preprocessed Query are identified using the Mention-Entity dictionary. Wherein, the Mention recognition uses an AC automaton to load a Mention-Entity dictionary, and carries out Mention search on the preprocessed Query.
3. Entity linking
Acquiring and referring to related entities according to the identified Mentions and the Mentions-Entity dictionary to obtain a candidate Entity set, and calculating the similarity between all entities in the candidate Entity set and the Query in order to acquire the Query related entities, and taking the Entity with the similarity score higher than a threshold value as the Query related Entity.
The similarity measurement method uses cosine similarity to convert Query and candidate entities into vectors, and calculates cosine distance between the two vectors.
The method comprises the following steps of: 1) Word segmentation is carried out on the Query by using jieba; 2) Removing stop words; 3) Vector conversion.
The vector conversion of the Query and the entity to be selected is consistent with the vector conversion method in the topic preprocessing. The similarity score threshold takes 0.95, and in order to avoid no return phenomenon, when no entity returns, the entity with score topk is taken according to the reference number k (maximum matching) to return.
4. Block search
And loading the Partition corresponding to the entity in the Collection according to the Query related entity, searching for similarity in the Partition, and returning the question id, the similarity score of the question and the Query from high to low according to the similarity score.
The Partition is obtained from a preprocessing stage, and the Partition and the entity are in one-to-one correspondence, so that the related topic vectors of the entity are saved.
Wherein, similarity search parameter sets: 'index_type' = 'ivf_flag', 'metric_type' = 'IP', 'nlist' = 20, 'nprobe' = 3, respectively represent a vector index establishment method, a distance measurement index, the number of clusters, and searching in the previous nprobe clusters during similarity searching.
5. Question recommendation
And according to the result [ topic id, similarity score ] obtained by the block search, and according to a predefined threshold, topics higher than the threshold score are taken as Query related topics. To ensure the exposure rate of the title, the relevant title is returned randomly.
According to the invention, a bridge between the Query and the problem is built by utilizing the knowledge graph, the Partition is built according to the entity by utilizing the Milvus vector search engine, the entity obtained by linking the entities is searched for vector similarity in the corresponding Partition, and the search speed is further improved. The invention constructs the relation between the user intention and the topic through the knowledge graph, and accelerates the searching speed by utilizing a vector search engine Milvus according to entity segmentation.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (6)
1. The intelligent question recommendation method based on the knowledge graph and the vector search engine is characterized by comprising the following steps of: the method comprises the following working steps:
the first step: performing Query pretreatment;
and a second step of: the method comprises the steps of (1) identifying a Mention;
and a third step of: entity linking;
fourth step: block searching;
fifth step: question recommendation.
2. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the first-step Query preprocessing comprises the following steps: the user inputs the Query, and may have non-standard content including full-angle characters, traditional characters, invalid characters and the like, so that the Query needs to be preprocessed;
the preprocessing operation of Query comprises full-half angle conversion, complex conversion and invalid character removal, and in addition, in order to keep consistency with a Mention-Entity dictionary, case-case conversion operation is carried out.
3. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the said second-step Mention recognition includes: identifying all references in the preprocessed Query by using a Mention-Entity dictionary;
wherein, the Mention recognition uses an AC automaton to load a Mention-Entity dictionary, and carries out Mention search on the preprocessed Query.
4. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the entity linking of the third step includes: acquiring and referring to related entities according to the identified Mentions and the Mentions-Entity dictionary to obtain a candidate Entity set, calculating the similarity between all entities in the candidate Entity set and the Query in order to acquire the Query related entities, and taking the Entity with the similarity score higher than a threshold value as the Query related Entity;
the similarity measurement method uses cosine similarity to convert Query and candidate entities into vectors, and calculates cosine distance between the two vectors;
the method comprises the following steps of: 1) Word segmentation is carried out on the Query by using jieba; 2) Removing stop words; 3) Vector transformation;
the vector conversion method of the Query and the entity to be selected is consistent with that of the topic pretreatment;
the similarity score threshold takes 0.95, and in order to avoid no return phenomenon, when no entity returns, the entity with score topk is taken according to the reference number k (maximum matching) to return.
5. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the fourth step of block searching includes: according to the Query related entity, loading the Partition corresponding to the entity in the Collection, performing similarity search in the Partition, and returning the question id, the similarity score of the question and the Query from high to low according to the similarity score;
the Partition is obtained from a preprocessing stage, and the Partition and the entity are in one-to-one correspondence, so that the related topic vectors of the entity are saved;
wherein, similarity search parameter sets: 'index_type' = 'ivf_flag', 'metric_type' = 'IP', 'nlist' = 20, 'nprobe' = 3, respectively represent a vector index establishment method, a distance measurement index, the number of clusters, and searching in the previous nprobe clusters during similarity searching.
6. The intelligent question-referring method based on the knowledge graph and the vector search engine of claim 1, wherein the method is characterized by comprising the following steps: the title recommendation of the fifth step further includes: and according to the result [ topic id, similarity score ] obtained by the block search, and according to a predefined threshold, topics higher than the threshold score are taken as Query related topics. To ensure the exposure rate of the title, the relevant title is returned randomly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211698717.9A CN116484008A (en) | 2022-12-28 | 2022-12-28 | Intelligent question recommendation method based on knowledge graph and vector search engine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211698717.9A CN116484008A (en) | 2022-12-28 | 2022-12-28 | Intelligent question recommendation method based on knowledge graph and vector search engine |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116484008A true CN116484008A (en) | 2023-07-25 |
Family
ID=87216082
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211698717.9A Pending CN116484008A (en) | 2022-12-28 | 2022-12-28 | Intelligent question recommendation method based on knowledge graph and vector search engine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116484008A (en) |
-
2022
- 2022-12-28 CN CN202211698717.9A patent/CN116484008A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112069298B (en) | Man-machine interaction method, device and medium based on semantic web and intention recognition | |
CN107451126B (en) | Method and system for screening similar meaning words | |
WO2021093755A1 (en) | Matching method and apparatus for questions, and reply method and apparatus for questions | |
CN108304372B (en) | Entity extraction method and device, computer equipment and storage medium | |
CN110377715A (en) | Reasoning type accurate intelligent answering method based on legal knowledge map | |
CN110737758A (en) | Method and apparatus for generating a model | |
CN113761890B (en) | Multi-level semantic information retrieval method based on BERT context awareness | |
CN109271524B (en) | Entity Linking Method in Knowledge Base Question Answering System | |
CN113505209A (en) | Intelligent question-answering system for automobile field | |
CN110895559A (en) | Model training method, text processing method, device and equipment | |
CN109614493B (en) | Text abbreviation recognition method and system based on supervision word vector | |
CN112925918B (en) | Question-answer matching system based on disease field knowledge graph | |
CN110399603A (en) | A kind of text-processing technical method and system based on sense-group division | |
CN115858750A (en) | Power grid technical standard intelligent question-answering method and system based on natural language processing | |
CN113032541A (en) | Answer extraction method based on bert and fusion sentence cluster retrieval | |
CN114461774A (en) | Question-answering system search matching method based on semantic similarity and application thereof | |
CN109522396B (en) | Knowledge processing method and system for national defense science and technology field | |
CN111125299A (en) | Dynamic word bank updating method based on user behavior analysis | |
CN115795018A (en) | Multi-strategy intelligent searching question-answering method and system for power grid field | |
CN117743526A (en) | Table question-answering method based on large language model and natural language processing | |
CN111538805A (en) | Text information extraction method and system based on deep learning and rule engine | |
CN115994535A (en) | Text processing method and device | |
CN111444720A (en) | Named entity recognition method for English text | |
CN113468311B (en) | Knowledge graph-based complex question and answer method, device and storage medium | |
CN114238595A (en) | Metallurgical knowledge question-answering method and system based on knowledge graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20230725 |
|
WD01 | Invention patent application deemed withdrawn after publication |