CN115964466A - Intelligent question-answering method and system based on four-layer feature vector matching model - Google Patents

Intelligent question-answering method and system based on four-layer feature vector matching model Download PDF

Info

Publication number
CN115964466A
CN115964466A CN202211734224.6A CN202211734224A CN115964466A CN 115964466 A CN115964466 A CN 115964466A CN 202211734224 A CN202211734224 A CN 202211734224A CN 115964466 A CN115964466 A CN 115964466A
Authority
CN
China
Prior art keywords
question
knowledge
answer
base
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211734224.6A
Other languages
Chinese (zh)
Inventor
贾晓霞
孟飞
朱雨洁
吕梦怡
钱大伟
周峰
范双全
邓苏
谢登峰
薛白石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinhang Digital Technology Co ltd
Original Assignee
Jinhang Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinhang Digital Technology Co ltd filed Critical Jinhang Digital Technology Co ltd
Priority to CN202211734224.6A priority Critical patent/CN115964466A/en
Publication of CN115964466A publication Critical patent/CN115964466A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an intelligent question answering method and system based on a four-layer feature vector matching model, wherein the method comprises the following steps: step S1: collecting knowledge and classifying the knowledge, and constructing six knowledge warehouses including a knowledge base, a body base, a question and answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouses; step S2: constructing a first-layer question-answer pair model; and step S3: constructing a second layer of structured data semantic generalization model; and step S4: constructing a third-layer unstructured data knowledge graph; step S5: constructing a fourth layer of full-text retrieval index; step S6: constructing a four-layer result feedback model, firstly passing through a first layer of model according to the problem or keyword to be retrieved input by a user in a search page, and pushing the result to the user if an answer is obtained; otherwise, switching to a second layer model, and repeating the steps until an answer is obtained; step S7: the user can perform evaluation feedback on the obtained answers. The method provided by the invention can be used for inquiring knowledge from multiple angles, so that the searching efficiency and accuracy are improved.

Description

Intelligent question-answering method and system based on four-layer feature vector matching model
Technical Field
The invention relates to the field of intelligent question answering in artificial intelligence, in particular to an intelligent question answering method and system based on a four-layer feature vector matching model.
Background
After the 20 th century, with the rise and development of natural language understanding technology, semantic analysis technology, mobile internet technology and deep learning algorithm, the development of intelligent question answering is also a rapid leap forward. Because the intelligent question and answer has the advantages of on-line service of 7X 24 hours, high response speed, high concurrent reception number, automatic reply and the like, in recent years, various domestic enterprises have breakthrough development in the field of intelligent question and answer successively, and intelligent question and answer products such as Ali Xiao Mi and millet Xiao ai Lei classmate are endless and are widely applied to daily life of people.
However, the traditional intelligent question answering method only matches questions in a question answering library only aiming at a question answering peer-to-peer mode, cannot search the questions in a multi-dimensional mode, and cannot perform positioning retrieval on accessory data. Therefore, aiming at multi-dimensional and multi-variety problem searching and feedback mechanisms, a complete searching feedback mechanism is constructed by using the knowledge graph, and the problem searching efficiency can be effectively improved.
The intelligent question-answering system can adapt to the service complexity of different industries and accurately answer questions provided by users, and mainly relates to two functions: first, whether the user's intention can be correctly understood; and secondly, whether corresponding answers can be accurately extracted from the knowledge warehouse or not can be judged.
At present, the mainstream method for understanding user intentions at home and abroad is to perform word segmentation processing on a question of a user by means of NLP, and perform semantic analysis after a word vector is formed. However, in the actual application process, the service domain is wide, and the included professional terms are numerous and other factors, which often affect the accuracy of word segmentation.
In the stage of extracting answers from the knowledge base after understanding the intention of the user, the matching degree of the answers and the questions depends on the accuracy of the classification algorithm model, and how to improve the accuracy of the algorithm model and how to optimize the search logic are also important in the intelligent question-answering technology.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intelligent question answering method and system based on a four-layer feature vector matching model.
The technical solution of the invention is as follows: an intelligent question-answering method based on a four-layer feature vector matching model comprises the following steps:
step S1: collecting knowledge and classifying the knowledge, and constructing six knowledge warehouses including a knowledge base, a body base, a question and answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouses;
step S2: extracting bag-of-words vectors of the question and answer library based on a TF-IDF algorithm and a chi-square detection algorithm, and performing question and answer pair model training on data in all the question and answer library based on a naive Bayes algorithm to obtain a trained question and answer pair model;
and step S3: classifying the structured data in the ontology base, the knowledge base, the expert base, the note base and the forum base, quantizing the data to form a generalized sample template and a corresponding generalized sample, configuring a graph walk query statement of the generalized sample template, and performing structured data semantic generalized model training on the generalized sample based on a naive Bayes model to obtain a trained structured data semantic generalized model;
and step S4: extracting triples of the unstructured data in the knowledge warehouse by using a dependency syntax model, extracting entity-relationship data, and constructing an unstructured data knowledge graph;
step S5: configuring a full-text retrieval schema for the knowledge warehouse, establishing an index for key information of the structured data, and establishing a full-text retrieval index for the unstructured data;
step S6: constructing a four-layer result feedback model: inputting a question or a keyword to be retrieved in a search page by a user, calculating the similarity between the question to be retrieved and the questions in the question and answer library through the question and answer pair model trained in the step S2, and pushing an answer to the user by an answer constructing module if the similarity is greater than a threshold value a; otherwise, switching to the structured data semantic generalization model in the step S3, calculating the similarity between the problem to be retrieved and the existing template in the sample template, and if the similarity is greater than a threshold b, pushing the answer to the user by the answer construction module; otherwise, transferring to the unstructured data map in the step S4, if the same-name unstructured entity node exists, pushing answers related to the node to a user, and if the same-name unstructured entity node does not exist, transferring to the full text retrieval schema in the step S5, and searching matched answers from the full text;
step S7: after the user obtains the answer, evaluation feedback can be carried out on the answer, if the feedback problem is solved, the portrait of the user is updated according to the problem, and the problem which the user is interested in and question return guidance are generated; if the feedback is not solved, the problem is recorded in an unknown problem library and is waited for maintenance by a service expert.
Compared with the prior art, the invention has the following advantages:
1. the invention discloses an intelligent question-answering method based on a four-layer feature vector matching model, which provides a plurality of question input modes, comprises questions or keywords and the like, is more intelligent and convenient than the traditional keyword retrieval mode, and is more suitable for the service scene of practical application.
2. The invention improves the accuracy of the question-answer pair model and the structured data semantic generalization model. When the question-answer library word bag vector is constructed, the relevance degree of the word to the question can be better reflected through the TF-IDF algorithm, meanwhile, the chi-square detection algorithm is used for carrying out dimension reduction on the word bag vector, irrelevant influence factors are further eliminated, and the accuracy rate of the question-answer to the model on a test set can reach more than 90%.
3. The invention can better deal with complex service scenes. The method comprises the steps of deeply combing the business fields related to the whole life cycle of the equipment manufacturing industry, and accordingly constructing a question-answer classification system and a field synonym library to enable a question-answer system to better solve various problems of the whole life cycle of the equipment manufacturing industry.
4. The four-layer feature vector matching system provided by the invention can be used for inquiring knowledge from multiple angles, and the structured and unstructured map construction also improves the searching efficiency and the searching accuracy of the knowledge.
Drawings
Fig. 1 is a flowchart of an intelligent question answering method based on a four-layer feature vector matching model in the embodiment of the present invention;
FIG. 2 is a schematic diagram of a challenge-response pair model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an architecture of a semantic generalization model for structured data according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a semantic generalization process for structured data according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a process for constructing a knowledge-graph of unstructured data according to an embodiment of the invention;
FIG. 6 is a schematic diagram of a construction process of full-text search according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a first layer of a process for constructing search answers and outputting results using a question-and-answer pair model according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating a second layer search answer construction and result output process using a semantic generalization model with structured data according to an embodiment of the present invention;
FIG. 9 is a block diagram illustrating an overall architecture of a four-layer result feedback model according to an embodiment of the present invention;
fig. 10 is a block diagram of an intelligent question-answering system based on a four-layer feature vector matching model in the embodiment of the present invention.
Detailed Description
The invention provides an intelligent question-answering method based on a four-layer feature vector matching model, which is used for constructing a four-layer search system, inquiring knowledge from multiple angles, and improving the search efficiency and the search accuracy of the knowledge by constructing a structured map and an unstructured map.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.
Example one
As shown in fig. 1, an intelligent question-answering method based on a four-layer feature vector matching model provided in an embodiment of the present invention includes the following steps:
step S1: collecting knowledge and classifying the knowledge, and constructing six knowledge warehouses including a knowledge base, a body base, a question and answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouses;
step S2: extracting bag-of-word vectors of the question and answer base based on a TF-IDF algorithm and a chi-square detection algorithm, and performing question and answer pair model training on data in all the question and answer bases based on a naive Bayes algorithm to obtain a trained question and answer pair model;
and step S3: classifying structured data in an ontology library, a knowledge library, an expert library, a note library, a forum library and a question-answer library, quantizing the data to form a generalized sample template and corresponding generalized samples, configuring graph walking query sentences of the generalized sample template, and performing structured data semantic generalized model training on the generalized samples based on a naive Bayes model to obtain a trained structured data semantic generalized model;
and step S4: extracting triples of unstructured data in a knowledge warehouse by using a dependency syntax model, extracting entity-relationship data, and constructing an unstructured data knowledge graph;
step S5: configuring a full-text retrieval schema for the knowledge warehouse, establishing an index for key information of the structured data, and establishing a full-text retrieval index for the unstructured data;
step S6: constructing a four-layer result feedback model: inputting a question or a keyword to be retrieved in a search page by a user, calculating the similarity between the question to be retrieved and the questions in the question and answer library through the question and answer pair model trained in the step S2, and pushing an answer to the user by an answer constructing module if the similarity is greater than a threshold value a; otherwise, switching to the structured data semantic generalization model of the step S3, calculating the similarity between the problem to be retrieved and the existing template in the sample template, and if the similarity is greater than a threshold b, pushing the answer to the user by an answer construction module; otherwise, switching to the unstructured data map in the step S4, if the same-name unstructured entity node exists, pushing the answer related to the node to the user, and if the same-name unstructured entity node does not exist, switching to the full text retrieval schema in the step S5, and searching the matched answer from the full text;
step S7: after the user obtains the answer, evaluation feedback can be carried out on the answer, if the feedback problem is solved, the portrait of the user is updated according to the problem, and the problem which the user is interested in and question return guidance are generated; if the feedback is not solved, the problem is recorded in an unknown problem library and is waited for maintenance by a service expert.
In one embodiment, the step S1: the method comprises the following steps of collecting knowledge, classifying the knowledge, and constructing six knowledge warehouses including a knowledge base, a body base, a question-answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouses, wherein the method specifically comprises the following steps:
acquiring knowledge of a plurality of data sources, or importing the knowledge in a batch import mode, classifying the acquired knowledge according to six knowledge systems, and constructing six knowledge warehouses including a knowledge base, a body base, a question and answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouses; meanwhile, a synonym word bank is constructed, semantic replacement can be performed on synonyms, and the method specifically comprises the following steps:
deeply combing the application business field, and constructing six knowledge systems of knowledge, ontology, question and answer, expert, note and forum;
through modes of batch import, knowledge acquisition and the like, the knowledge can be imported into the system in batches, or the knowledge can be acquired from a third-party system, and the acquired knowledge is classified according to six knowledge systems to construct a unified knowledge warehouse. Knowledge acquisition can be carried out by various data sources such as databases, documents, websites and the like, and the knowledge is acquired into a business system through acquisition tasks. Meanwhile, in order to improve the searching accuracy, the embodiment of the invention constructs the synonym thesaurus, and can perform semantic replacement on the synonym during searching.
In one embodiment, the step S2: extracting bag-of-words vectors of the question-answer base based on a TF-IDF algorithm and a chi-square detection algorithm, and simultaneously carrying out question-answer pair model training on data in all the question-answer bases based on a naive Bayes algorithm to obtain a trained question-answer pair model, wherein the method specifically comprises the following steps:
step S21: performing feature extraction on the question and answer data in the question and answer library based on a TF-IDF algorithm, counting word frequency of each word in a question sentence, counting inverse text frequency of one word in all texts, and obtaining a word bag vector of the question and answer library by calculating the word frequency-inverse text frequency of all the question sentences;
step S22: screening and optimizing the word bag vectors according to the chi-square value by using a chi-square detection algorithm as shown in formula (1), and vectorizing the question based on the word bag vectors to obtain a question vector;
Figure BDA0004032563280000051
wherein, O is the actual value of the bag-of-words vector, E is the expected value of the bag-of-words vector; x is the number of 2 The chi-square value of the bag of words vector;
step S23: and taking the question vector as input, taking the question ID as output, and performing model training by using a naive Bayesian classification algorithm to obtain a trained question-answer pair model.
Fig. 2 is a schematic diagram of the structure of the question-answer pair model.
In one embodiment, the step S3: classifying structured data in an ontology base, a knowledge base, an expert base, a note base, a forum base and a question-answer base, quantizing the data to form a generalized sample template and corresponding generalized samples, configuring graph walk query sentences of the generalized sample template, and performing structured data semantic generalized model training on the generalized samples based on a naive Bayes model to obtain a trained structured data semantic generalized model, which specifically comprises the following steps:
configuring a CQL graph walk query statement of a generalized sample template according to preset forms of what, where, whho, which and how, constructing a corresponding generalized sample according to the generalized sample template, and training a structured data semantic generalized model by using the generalized sample to obtain the trained structured data semantic generalized model.
Firstly, generalized sample templates in the forms of what, where, whho, which and how are configured, problems input by a user are converted according to the forms of the templates, and corresponding generalized samples are obtained and then used for model training.
FIG. 3 is a schematic diagram of the architecture of the semantic generalization model of the structured data.
FIG. 4 is a schematic diagram illustrating a semantic generalization process of structured data.
In one embodiment, the step S4: extracting triples of unstructured data in a knowledge warehouse by using a dependency syntax model, extracting entity-relationship data, and constructing an unstructured data knowledge graph, wherein the method specifically comprises the following steps:
and (3) constructing a dependency syntax tree of the sentence by using the dependency syntax model for the unstructured data in the knowledge warehouse, obtaining entity-relationship triples of the sentence by using syntax analysis and a BROWN word clustering algorithm, and forming an unstructured data map through fusion calculation.
Extracting the triple of the attachment content of the unstructured data such as the attachment of the knowledge warehouse through a dependency syntax model, and extracting useful data such as entity-relation; and extracting domain-specific entity-relationship data by using a domain-specific corpus in the knowledge warehouse, through corpus tagging and using a named entity identification model.
When the triple relation of the unstructured data is extracted, a dependency syntax tree of a sentence is constructed by using a dependency syntax model, entity-relation triples of the sentence are analyzed by using syntax analysis and a BROWN word clustering algorithm, duplicate removal is carried out on entities with the same name in the same attachment, and the unstructured data knowledge graph is formed through fusion calculation.
FIG. 5 is a schematic diagram of a process for constructing a knowledge graph of unstructured data.
In one embodiment, the step S5: configuring a full-text retrieval schema for the knowledge warehouse, establishing an index for key information of the structured data, and establishing a full-text retrieval index for the unstructured data; wherein the key information includes: title, keyword, abstract, category, security level, creator and release time, specifically comprising:
and configuring a full text retrieval schema for the knowledge warehouse, and structuring data, including: title, keyword, abstract, category, security level, creator and release time are used as key information to establish index, and unstructured data is used for establishing full-text retrieval index; and meanwhile, setting the search weight of the index field, for example, setting the weight of the title to be greater than that of the attachment, the higher the weight of the field, the more front the position of the field in the query result list.
Fig. 6 is a schematic diagram illustrating a construction process of full-text search.
In one embodiment, the step S6: constructing a four-layer result feedback model: inputting a question or a keyword to be retrieved in a search page by a user, calculating the similarity between the question to be retrieved and the existing questions in the question-answer library through the question-answer pair model trained in the step S2, and pushing an answer to the user by an answer construction module if the similarity is greater than a threshold value a; otherwise, switching to the structured data semantic generalization model of the step S3, calculating the similarity between the problem to be retrieved and the existing template in the sample template, and if the similarity is greater than a threshold b, pushing the answer to the user by an answer construction module; otherwise, switching to the unstructured data map in the step S4, if the same-name unstructured entity node exists, pushing the answer related to the node to the user, and if the same-name unstructured entity node does not exist, switching to the full text retrieval schema in the step S5, and searching the matched answer from the full text;
in the step, a four-layer result feedback model is constructed for answer retrieval, and the method specifically comprises the following steps:
1) Inputting a searched problem or a keyword in a search page, performing word segmentation on the problem, and removing stop words;
2) Using the trained question-answer pair model of step S2, as shown in fig. 7, calculating cosine similarity between the question vector and the question in the question-answer library, and directly returning a matched question answer when the similarity between only one question in the question-answer pair and the question is greater than a question-answer threshold a; when a plurality of question similarities are greater than a question-answer threshold a but the highest difference between the first two similarities is greater than a confidence difference, directly returning the matched question answers; returning to a similar question list when the similarity of the questions is greater than a question-answer threshold a and the difference value between the first two highest similarity is smaller than the confidence difference value, and further guiding a user to accurately position the question; when the similarity is smaller than a question-answer threshold value a, entering the next step;
3) The questions are transferred into the structural data semantic generalization model trained in the step S3, as shown in FIG. 8, the similarity between the current question and the existing template in the generalization sample template is calculated, if a matching template larger than the question-answer threshold b exists, the answer construction module pushes the answers to the user, otherwise, the next step is carried out;
4) The questions are transferred to an entity node set in the unstructured data knowledge graph constructed in the step S4, if unstructured entity nodes with the same or similar names exist, the data related to the nodes are pushed to the user as answers by an answer construction module, and if the unstructured entity nodes do not exist, the questions are transferred to the next step;
5) And (5) transferring the problem into the full text retrieval schema constructed in the step S5, searching key fields defined by the schema, and matching the data of the knowledge warehouse according to the indexed full text.
Wherein, the answer construction module in this step shows the answer mode as follows:
aiming at the retrieval result of the first-layer question-answer pair model, displaying the retrieval result in the form of a matched question, the best answer of the question and a full text retrieval set result list;
searching out a matched map node or attribute according to a configured CQL graph wandering query statement by using a matched generalized sample template aiming at a retrieval result of a semantic generalization model of the second layer of structured data, and displaying the map node or attribute in a form of a map query result list and a full text retrieval set result list;
aiming at the retrieval results of the entity nodes of the third-layer unstructured data knowledge graph, displaying the retrieval results in the form of a knowledge list associated with the matched unstructured nodes and a full-text retrieval set result list;
and aiming at the retrieval result of the full-text retrieval of the fourth layer, the retrieval result is displayed in a full-text retrieval aggregate result list form under the definition of the search range and the search weight configured by the schema.
Fig. 9 shows a general architecture diagram of a four-layer result feedback model.
In one embodiment, the step S7: after the user obtains the answer, evaluation feedback can be carried out on the answer, if the feedback problem is solved, the portrait of the user is updated according to the problem, and the problem which the user is interested in and question return guidance are generated; if the feedback is not solved, the problem is recorded in an unknown problem library and is waited for maintenance by a service expert.
The invention discloses an intelligent question-answering method based on a four-layer feature vector matching model, which provides a plurality of question input modes, comprises questions or keywords and the like, is more intelligent and convenient than the traditional keyword retrieval mode, and is more suitable for the service scene of practical application. The method improves the accuracy of the question-answer pair model and the structured data semantic generalization model. When constructing the question-answer library word bag vector, the TF-IDF algorithm can better reflect the relevance degree of the word to the question, and meanwhile, the chi-square detection algorithm is used for carrying out dimension reduction processing on the word bag vector, and irrelevant influence factors are further removed, so that the accuracy rate of the question-answer to the matching model on the test set can reach more than 90%. The invention can better deal with complex service scenes. The method comprises the steps of deeply combing the business fields related to the whole life cycle of the equipment manufacturing industry, and accordingly constructing a question-answer classification system and a field synonym library to enable a question-answer system to better solve various problems of the whole life cycle of the equipment manufacturing industry. The four-layer search system provided by the invention can be used for inquiring knowledge from multiple angles, and the structured and unstructured map construction also improves the search efficiency and the search accuracy of the knowledge.
Example two
As shown in fig. 10, an embodiment of the present invention provides an intelligent question-answering system based on a four-layer feature vector matching model, including the following modules:
the knowledge warehouse building module 81 is used for collecting knowledge, classifying the knowledge, and building six knowledge warehouses including a knowledge base, a body base, a question and answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouse;
the question-answer pair training model module 82 is used for extracting word bag vectors of a question-answer library based on a TF-IDF algorithm and a chi-square detection algorithm, and simultaneously carrying out question-answer pair model training on data in all the question-answer libraries based on a naive Bayesian algorithm to obtain a trained question-answer pair model;
the training structured data semantic generalization model module 83 is used for classifying structured data in the ontology library, the knowledge library, the expert library, the note library, the forum library and the question and answer library, quantizing the data to form a generalization sample template and a corresponding generalization sample, configuring a graph walking query statement of the generalization sample template, and performing structured data semantic generalization model training on the generalization sample based on a naive Bayes model to obtain a trained structured data semantic generalization model;
the data knowledge graph establishing module 84 is used for extracting triples of unstructured data in the knowledge warehouse by using a dependency syntax model, extracting entity-relationship data and establishing an unstructured data knowledge graph;
the full-text index building module 85 is used for configuring a full-text retrieval schema for the knowledge warehouse, building an index for the key information of the structured data, and building a full-text retrieval index for the unstructured data;
a four-layer result feedback model module 86 is constructed: the question and answer searching method comprises the steps that a user inputs a question or a keyword to be retrieved in a searching page, the similarity between the question to be retrieved and an existing question in a question and answer library is calculated through a trained question and answer pair model, and if the similarity is larger than a threshold value a, an answer is pushed to the user through an answer constructing module; otherwise, switching to a structured data semantic generalization model, calculating the similarity between the problem to be retrieved and the existing template in the sample template, and if the similarity is greater than a threshold b, pushing the answer to the user by an answer construction module; otherwise, transferring the node into an unstructured data map, if the same-name unstructured entity node exists, pushing an answer related to the node to a user, and if the same-name unstructured entity node does not exist, transferring the node into a full-text retrieval schema, and searching a matched answer from the full text;
the evaluation feedback module 87: after the user obtains the answer, evaluation feedback can be carried out on the answer, if the feedback problem is solved, the portrait of the user is updated according to the problem, and the problem which the user is interested in and question return guidance are generated; if the feedback is not solved, the problem is recorded in an unknown problem library and is waited for maintenance by a service expert.
The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims (7)

1. An intelligent question-answering method based on a four-layer feature vector matching model is characterized by comprising the following steps:
step S1: collecting knowledge and classifying the knowledge, and constructing six knowledge warehouses including a knowledge base, a body base, a question and answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouses;
step S2: extracting word bag vectors of the question and answer library based on a TF-IDF algorithm and a chi-square detection algorithm, and simultaneously carrying out question and answer pair model training on all data in the question and answer library based on a naive Bayesian algorithm to obtain a trained question and answer pair model;
and step S3: classifying the structural data in the ontology library, the knowledge library, the expert library, the note library, the forum library and the question-answer library, quantizing the data to form a generalized sample template and a corresponding generalized sample, configuring a graph walk query sentence of the generalized sample template, and performing structural data semantic generalized model training on the generalized sample based on a naive Bayes model to obtain a trained structural data semantic generalized model;
and step S4: extracting triples of the unstructured data in the knowledge warehouse by using a dependency syntax model, extracting entity-relationship data, and constructing an unstructured data knowledge graph;
step S5: configuring a full-text retrieval schema for the knowledge warehouse, establishing an index for key information of the structured data, and establishing a full-text retrieval index for the unstructured data;
step S6: constructing a four-layer result feedback model: inputting a question or a keyword to be retrieved in a search page by a user, calculating the similarity between the question to be retrieved and the existing questions in the question-answer library through the question-answer pair model trained in the step S2, and pushing an answer to the user by an answer construction module if the similarity is greater than a threshold value a; otherwise, switching to the structured data semantic generalization model in the step S3, calculating the similarity between the problem to be retrieved and the existing template in the sample template, and if the similarity is greater than a threshold b, pushing the answer to the user by the answer construction module; otherwise, transferring to the unstructured data map in the step S4, if the same-name unstructured entity node exists, pushing answers related to the node to a user, and if the same-name unstructured entity node does not exist, transferring to the full text retrieval schema in the step S5, and searching matched answers from the full text;
step S7: after the user obtains the answer, evaluation feedback can be carried out on the answer, if the feedback problem is solved, the portrait of the user is updated according to the problem, and the problem which the user is interested in and question return guidance are generated; if the feedback is not solved, the problem is recorded in an unknown problem library and is waited for maintenance by a service expert.
2. The intelligent question-answering method based on the four-layer feature vector matching model according to claim 1, wherein the step S1: acquiring knowledge and classifying the knowledge, and constructing six knowledge warehouses including a knowledge base, an ontology base, a question-answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouses, wherein the structured data map specifically comprises the following steps:
acquiring knowledge of a plurality of data sources, or importing the knowledge in a batch import mode, classifying the acquired knowledge according to six knowledge systems, and constructing six knowledge warehouses including a knowledge base, a body base, a question and answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouses; meanwhile, a synonym thesaurus is constructed, and semantic replacement can be performed on synonyms.
3. The intelligent question-answering method based on the four-layer feature vector matching model according to claim 2, wherein the step S2: extracting bag-of-words vectors of the question-answer library based on a TF-IDF algorithm and a chi-square detection algorithm, and simultaneously carrying out question-answer pair model training on all data in the question-answer library based on a naive Bayes algorithm to obtain a trained question-answer pair model, wherein the method specifically comprises the following steps:
step S21: performing feature extraction on the question and answer data in the question and answer library based on a TF-IDF algorithm, counting the word frequency of each word in a question and a word frequency of an inverse text of one word in all texts, and calculating the word frequency-inverse text frequency of all the questions to obtain a word bag vector of the question and answer library;
step S22: screening and optimizing the bag of words vector according to a chi-square value by using a chi-square detection algorithm as shown in formula (1), and vectorizing a question based on the bag of words vector to obtain a question vector;
Figure FDA0004032563270000021
wherein O is the actual value of the bag-of-words vector and E is the expected value of the bag-of-words vector; x is the number of 2 The chi-square value of the bag-of-words vector is obtained;
step S23: and taking the question vector as input, taking the question ID as output, and performing model training by using a naive Bayes classification algorithm to obtain a trained question-answer pair model.
4. The intelligent question-answering method based on the four-layer feature vector matching model according to claim 1, wherein the step S3: classifying the structured data in the ontology base, the knowledge base, the expert base, the note base, the forum base and the question-answer base, quantizing the data to form a generalized sample template and corresponding generalized samples, configuring graph walk query sentences of the generalized sample template, and performing structured data semantic generalized model training on the generalized samples based on a naive Bayes model to obtain a trained structured data semantic generalized model, wherein the method specifically comprises the following steps of:
configuring a CQL graph walk query statement of the generalized sample template according to preset what, when, where, who, which and how forms, constructing a corresponding generalized sample according to the generalized sample template, and training a structured data semantic generalized model by using the generalized sample to obtain the trained structured data semantic generalized model.
5. The intelligent question-answering method based on the four-layer feature vector matching model according to claim 1, wherein the step S4: extracting triples of the unstructured data in the knowledge warehouse by using a dependency syntax model, extracting entity-relationship data, and constructing an unstructured data knowledge graph, wherein the method specifically comprises the following steps:
and constructing a dependency syntax tree of the sentence by using a dependency syntax model for the unstructured data in the knowledge warehouse, obtaining entity-relationship triples of the sentence by using syntax analysis and a BROWN word clustering algorithm, and forming an unstructured data map through fusion calculation.
6. The intelligent question-answering method based on the four-layer feature vector matching model according to claim 1, wherein the step S5: configuring a full-text retrieval schema for the knowledge warehouse, establishing an index for key information of the structured data, and establishing a full-text retrieval index for the unstructured data; wherein the key information comprises: title, keyword, abstract, category, security level, creator and release time, specifically comprising:
configuring a full-text retrieval schema for the knowledge warehouse, and structuring data, wherein the full-text retrieval schema comprises: title, keyword, abstract, category, security level, creator and release time are used as key information to establish index, and unstructured data is established into full text retrieval index; and setting the search weight of the index field.
7. An intelligent question-answering system based on a four-layer feature vector matching model is characterized by comprising the following modules:
the system comprises a knowledge warehouse building module, a database management module, a query and answer module, a note database and a forum database, wherein the knowledge warehouse building module is used for collecting knowledge and classifying the knowledge, and building six knowledge warehouses of a knowledge base, a body base, a question and answer base, an expert base, a note base and a forum base and a structured data map of the knowledge warehouses;
the training question-answer pair model module is used for extracting word bag vectors of the question-answer library based on a TF-IDF algorithm and a chi-square detection algorithm, and simultaneously carrying out question-answer pair model training on data in the question-answer library based on a naive Bayesian algorithm to obtain a trained question-answer pair model;
the training structured data semantic generalization model module is used for classifying structured data in the ontology base, the knowledge base, the expert base, the note base, the forum base and the question-answer base, quantizing the data to form a generalization sample template and a corresponding generalization sample, configuring a graph walking query statement of the generalization sample template, and performing structured data semantic generalization model training on the generalization sample based on a naive Bayes model to obtain a trained structured data semantic generalization model;
the system comprises a data knowledge graph establishing module, a data knowledge graph establishing module and a data knowledge graph establishing module, wherein the data knowledge graph establishing module is used for extracting triples of unstructured data in a knowledge warehouse by using a dependency syntax model, extracting entity-relation data and establishing an unstructured data knowledge graph;
the full-text index building module is used for configuring a full-text retrieval schema for the knowledge warehouse, building an index for key information of the structured data, and building a full-text retrieval index for the unstructured data;
constructing a four-layer result feedback model module: the question searching module is used for inputting questions or keywords to be retrieved in a searching page by a user, calculating the similarity between the questions to be retrieved and the questions in the question-answer library through the trained question-answer pair model, and pushing answers to the user by the answer constructing module if the similarity is greater than a threshold value a; otherwise, switching to the structured data semantic generalization model, calculating the similarity between the problem to be retrieved and the existing template in the sample template, and if the similarity is greater than a threshold value b, pushing the answer to the user by the answer construction module; otherwise, transferring the data to the unstructured data map, if the same-name unstructured entity node exists, pushing an answer related to the node to a user, if the same-name unstructured entity node does not exist, transferring the data to the full text retrieval schema, and searching a matched answer from the full text;
an evaluation feedback module: after the user obtains the answer, the user can perform evaluation feedback on the answer, if the feedback question is solved, the portrait of the user is updated according to the question, and the question which the user is interested in and the question return guide are generated; if the feedback is not solved, the problem is recorded in an unknown problem library and is waited for maintenance by a service expert.
CN202211734224.6A 2022-12-30 2022-12-30 Intelligent question-answering method and system based on four-layer feature vector matching model Pending CN115964466A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211734224.6A CN115964466A (en) 2022-12-30 2022-12-30 Intelligent question-answering method and system based on four-layer feature vector matching model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211734224.6A CN115964466A (en) 2022-12-30 2022-12-30 Intelligent question-answering method and system based on four-layer feature vector matching model

Publications (1)

Publication Number Publication Date
CN115964466A true CN115964466A (en) 2023-04-14

Family

ID=87354578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211734224.6A Pending CN115964466A (en) 2022-12-30 2022-12-30 Intelligent question-answering method and system based on four-layer feature vector matching model

Country Status (1)

Country Link
CN (1) CN115964466A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332851A (en) * 2023-09-08 2024-01-02 珠海盈米基金销售有限公司 LLM question-answering platform construction method and system based on private knowledge base

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332851A (en) * 2023-09-08 2024-01-02 珠海盈米基金销售有限公司 LLM question-answering platform construction method and system based on private knowledge base
CN117332851B (en) * 2023-09-08 2024-05-31 珠海盈米基金销售有限公司 LLM question-answering platform construction method and system based on private knowledge base

Similar Documents

Publication Publication Date Title
Sheth et al. Semantics for the semantic web: The implicit, the formal and the powerful
US9715493B2 (en) Method and system for monitoring social media and analyzing text to automate classification of user posts using a facet based relevance assessment model
CN102087669B (en) Intelligent search engine system based on semantic association
Remi et al. Domain ontology driven fuzzy semantic information retrieval
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
Carta et al. Iterative zero-shot llm prompting for knowledge graph construction
Sbattella et al. A novel semantic information retrieval system based on a three-level domain model
CN114090861A (en) Education field search engine construction method based on knowledge graph
CN117312499A (en) Big data analysis system and method based on semantics
Ali et al. Named entity recognition using deep learning: A review
CN115964466A (en) Intelligent question-answering method and system based on four-layer feature vector matching model
Al-Obaydy et al. Document classification using term frequency-inverse document frequency and K-means clustering
Subiksha Improvement in analyzing healthcare systems using deep learning architecture
Wu et al. The CRFs-based Chinese open entity relation extraction
CN111581326B (en) Method for extracting answer information based on heterogeneous external knowledge source graph structure
Omri et al. Dynamic editing distance-based extracting relevant information approach from social networks
Chen et al. Research on knowledge graph application technology
Ayvaz et al. Using RDF summary graph for keyword-based semantic searches
Lin et al. Research and application of knowledge graph technology for intelligent question answering
Yadav et al. Efficient retrieval of data using semantic search engine based on NLP and RDF
Dai et al. Construction of Visual Question and Answering System Based on Knowledge Graph for Specific Objects
Guruvayur et al. Automatic relationship construction in domain ontology engineering using semantic and thematic graph generation process and convolution neural network
Shivashankar et al. Reaching out for the Answer: Answer Type Prediction.
Koukaras et al. Introducing a novel bi-functional method for exploiting sentiment in complex information networks
CN113886535B (en) Knowledge graph-based question and answer method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination