CN117033571A - Knowledge question-answering system construction method and system - Google Patents

Knowledge question-answering system construction method and system Download PDF

Info

Publication number
CN117033571A
CN117033571A CN202310765310.1A CN202310765310A CN117033571A CN 117033571 A CN117033571 A CN 117033571A CN 202310765310 A CN202310765310 A CN 202310765310A CN 117033571 A CN117033571 A CN 117033571A
Authority
CN
China
Prior art keywords
entities
knowledge
model
relation
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310765310.1A
Other languages
Chinese (zh)
Inventor
李志芸
冯落落
李晓瑜
李沛
张庆功
尹青山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong New Generation Information Industry Technology Research Institute Co Ltd
Original Assignee
Shandong New Generation Information Industry Technology Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong New Generation Information Industry Technology Research Institute Co Ltd filed Critical Shandong New Generation Information Industry Technology Research Institute Co Ltd
Priority to CN202310765310.1A priority Critical patent/CN117033571A/en
Publication of CN117033571A publication Critical patent/CN117033571A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a knowledge question-answering system construction method and a knowledge question-answering system construction system, belongs to the technical field of big data processing, and aims to solve the technical problem of how to construct the knowledge question-answering system by combining a big model and a knowledge graph. The method comprises the following steps: collecting and arranging knowledge data related to the chemical field, extracting the relation among the entity, the relation and the attribute, and constructing a knowledge graph; analyzing and understanding the problem text input by the user, and extracting entities, relations and attributes; according to the keywords and the entities in the problem text, information retrieval is carried out in the knowledge graph; integrating information of the question text input by the user and the retrieved entity, relation and attribute to obtain a corresponding answer which is input by the promt and is generated based on an answer prediction model constructed by a large model technology; based on the interface requirements of the user, the generated answers are formatted and presented, including being presented in the form of texts and charts.

Description

Knowledge question-answering system construction method and system
Technical Field
The application relates to the technical field of big data processing, in particular to a knowledge question-answering system construction method and system.
Background
Large models can master a large amount of knowledge and information by training on large-scale text data. It has many advantages in that it can extract information from text in various fields, including science, history, literature, technology, etc., and can answer various types of questions. Large language models can understand and generate natural language, which has powerful language understanding and generating capabilities. When faced with a user's question, answers can be generated in a smooth, accurate manner. Personalized response and adaptation may be made based on the user's input and context. Meanwhile, the system has learning capability, and can continuously improve the performance of the system through interaction with a user, thereby improving the accuracy and quality of answers. This learning capability allows models to be continually advanced and better meet the needs of the user. Therefore, the large model can be well applied to a question-answering system.
Despite the many advantages of large models, there are challenges such as misleading answers to models, consistency problems of conversations, and data bias. Especially in the professional vertical field such as water conservancy industry, the answer is needed according to the past data, the authenticity of the answer is ensured, and the grappling can not be carried out.
A Knowledge Graph (knowledgegraph) is a graphical structure used to organize and represent Knowledge. It is a knowledge base containing entities, attributes and relationships between them. In a knowledge graph, entities represent specific objects or concepts of the real world, and attributes describe relationships between entities or features of the entities. Knowledge graph integrates domain knowledge into a unified structure so that a computer can understand and process the knowledge. It can extract, link and organize information from a plurality of information sources to build a rich knowledge network.
The information in the knowledge graph is usually from reliable data sources or knowledge notes of experts, and is subjected to strict verification and audit. This allows knowledge patterns to have advantages in terms of reliability and controllability of data. In contrast, large models acquire knowledge through automatic training of large-scale text data, and it is difficult to ensure accuracy and reliability of the data.
Industry knowledge graph takes data in the field or enterprises as main sources, and is generally required to be rapidly enlarged, an industry barrier is constructed, the knowledge structure is more complex, and ontology engineering and rule-type knowledge are generally included. The quality requirement of knowledge extraction is very high, and more relies on structured, unstructured and semi-structured data from the enterprise to carry out joint extraction, and manual checking is needed to ensure the quality. The field where fusion of multiple sources is often required is an effective means of data scaling. The application form is more comprehensive, and besides search questions and answers, the method also comprises decision analysis, service management and the like, and has higher requirements on reasoning and stronger interpretability requirements. The main fields are e-commerce, finance, agriculture, security, medical treatment and the like.
How to combine the large model and the knowledge graph to construct the knowledge question-answering system is a technical problem to be solved.
Disclosure of Invention
The technical task of the application is to provide a knowledge question-answering system construction method and a knowledge question-answering system construction system aiming at the defects, so as to solve the technical problem of how to construct the knowledge question-answering system by combining a large model and a knowledge graph.
The first application relates to a knowledge question-answering system construction method, which is used for constructing a knowledge question-answering system in the chemical field based on knowledge graphs, langchain and large model technology, and comprises the following steps:
collecting and arranging knowledge data related to the chemical field, preprocessing the knowledge data through a natural language processing technology, extracting the relationship among the entities, the relationship and the attributes, and constructing a knowledge graph based on the relationship among the entities, the relationship and the attributes, wherein the relationship is a semantic relationship among the entities, and the attributes are descriptive information for describing the entities, including describing the characteristics and the properties of the entities;
analyzing and understanding the problem text input by the user through a natural language processing technology, and extracting entities, relations and attributes;
according to the keywords and the entities in the problem text, information retrieval is carried out in the knowledge graph to obtain related entities, relations and attributes;
integrating information of the problem text input by the user and the retrieved entity, relation and attribute to obtain a prompt;
generating a corresponding answer based on an answer prediction model constructed by a large model technology by taking a prompt as an input;
based on the interface requirements of the user, the generated answers are formatted and presented, including being presented in the form of texts and charts.
Preferably, the knowledge data includes structured data and unstructured data;
for the structured data, extracting the relationship among the entity, the relationship and the attribute in the modes of entity modeling, relationship modeling and triplet storage;
for unstructured data, the relationships among entities, relationships and attributes are extracted by means of entity extraction and relationship extraction.
Preferably, the entity is extracted by:
performing regular matching based on rules to identify named entities;
or, treating the named entity recognition as a sequence labeling problem based on a statistical model, wherein the statistical model comprises a hidden Markov model, a conditional Markov model and a conditional random field model;
or, using word vectors in the problem text as a basis for realizing end-to-end named entity recognition based on the neural network model;
extracting the relation by a rule-based method or a machine learning-based method;
the relation extraction is carried out by a rule-based method, which comprises the following steps: extracting semantic relationships between entities by identifying grammatical structures and context information in the question text using predefined rules and pattern matching techniques;
the relation extraction is carried out by a machine learning-based method, comprising the following steps: training a relation extraction model by using a supervised learning or unsupervised learning algorithm, and identifying and extracting semantic relations between entities from the problem text based on the trained relation extraction model;
the attribute extraction is performed by the following steps:
performing feature extraction on the problem text based on a rule matching method, supervised learning or semi-supervised learning or a deep learning method;
identifying and extracting attributes based on the extracted features through a preconfigured classification model or a sequence annotation model;
the rule-based matching method comprises rule-based pattern matching and rule-based keyword matching;
when the deep learning method is used for extracting the characteristics of the problem text, the characteristic extraction is carried out on the problem text through the trained BERT model.
Preferably, the constructed knowledge graph is stored by a graph database;
and according to the keywords and the entities in the problem text, retrieving from the knowledge graph through the query language of the graph database, and returning the entities, the relations and the attributes related to the problem text.
Preferably, the answer prediction model is a model constructed based on chatgpt, chatglm or a text-to-speech.
In a second aspect, the present application provides a knowledge question-answering system construction system for constructing a knowledge question-answering system in a chemical field by the knowledge question-answering system construction method according to any one of the first aspects, the construction system comprising:
the knowledge graph construction module is used for collecting and arranging knowledge data related to the chemical field, preprocessing the knowledge data through a natural language processing technology, extracting the relation among the entities, the relation and the attribute, and constructing a knowledge graph based on the relation among the entities, the relation and the attribute, wherein the relation is a semantic relation among the entities, and the attribute is descriptive information for describing the entities, including describing the characteristics and the properties of the entities;
the extraction module is used for analyzing and understanding the problem text input by the user through a natural language processing technology and extracting entities, relations and attributes;
the retrieval matching module is used for carrying out information retrieval in the knowledge graph according to the keywords and the entities in the problem text to obtain related entities, relations and attributes;
the information integration module is used for integrating the information of the problem text input by the user and the retrieved entity, relationship and attribute to obtain a prompt;
the answer generation module is used for taking a sample as input and generating a corresponding answer based on an answer prediction model constructed by a large model technology;
and the answer display module is used for carrying out formatted presentation on the generated answer based on the interface requirement of the user, and comprises display in the form of text and a chart.
Preferably, the knowledge data includes structured data and unstructured data;
for the structured data, the knowledge graph construction module is used for extracting the relationship among the entity, the relationship and the attribute in the modes of entity modeling, relationship modeling and triplet storage;
for unstructured data, the knowledge graph construction module is used for extracting the relation among the entities, the relation and the attributes in the way of entity extraction and relation extraction.
Preferably, the extraction module is configured to extract the entity by:
performing regular matching based on rules to identify named entities;
or, treating the named entity recognition as a sequence labeling problem based on a statistical model, wherein the statistical model comprises a hidden Markov model, a conditional Markov model and a conditional random field model;
or, using word vectors in the problem text as a basis for realizing end-to-end named entity recognition based on the neural network model;
the extraction module is used for extracting the relation by a rule-based method or a machine learning-based method;
the relation extraction is carried out by a rule-based method, which comprises the following steps: extracting semantic relationships between entities by identifying grammatical structures and context information in the question text using predefined rules and pattern matching techniques;
the relation extraction is carried out by a machine learning-based method, comprising the following steps: training a relation extraction model by using a supervised learning or unsupervised learning algorithm, and identifying and extracting semantic relations between entities from the problem text based on the trained relation extraction model;
the extraction module is used for extracting the attributes through the following steps:
performing feature extraction on the problem text based on a rule matching method, supervised learning or semi-supervised learning or a deep learning method;
identifying and extracting attributes based on the extracted features through a preconfigured classification model or a sequence annotation model;
the rule-based matching method comprises rule-based pattern matching and rule-based keyword matching;
when the deep learning method is used for extracting the characteristics of the problem text, the characteristic extraction is carried out on the problem text through the trained BERT model.
Preferably, the knowledge graph is stored in a graph database;
the search matching module is used for executing the following steps: and according to the keywords and the entities in the problem text, retrieving from the knowledge graph through the query language of the graph database, and returning the entities, the relations and the attributes related to the problem text.
Preferably, the answer prediction model is a model constructed based on chatgpt, chatglm or a text-to-speech.
The knowledge question-answering system construction method and system have the following advantages: the knowledge graph technology-based question-answering system can utilize knowledge graph structured knowledge representation, knowledge fusion in the professional field and flexible information inquiry, and the large model has the advantages of stronger context understanding capability, multi-field knowledge coverage, reasoning capability and language generating capability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
The application is further described below with reference to the accompanying drawings.
FIG. 1 is a general framework of knowledge graph;
fig. 2 is a flow chart of a knowledge question-answering system construction method of embodiment 1.
Detailed Description
The application will be further described with reference to the accompanying drawings and specific examples, so that those skilled in the art can better understand the application and implement it, but the examples are not meant to limit the application, and the technical features of the embodiments of the application and the examples can be combined with each other without conflict.
The embodiment of the application provides a knowledge question-answering system construction method and a knowledge question-answering system construction system, which are used for solving the technical problem of how to construct the knowledge question-answering system by combining a large model and a knowledge graph.
Example 1:
the application discloses a knowledge question-answering system construction method, which is used for constructing a knowledge question-answering system in the chemical field based on knowledge graph, langChain and large model technology, and comprises the following steps:
s100, collecting and arranging knowledge data related to the chemical field, preprocessing the knowledge data through a natural language processing technology, extracting the relation among the entities, the relation and the attribute, and constructing a knowledge graph based on the relation among the entities, the relation and the attribute, wherein the relation is a semantic relation among the entities, and the attribute is descriptive information for describing the entities, including describing the characteristics and the properties of the entities;
s200, analyzing and understanding the problem text input by the user through a natural language processing technology, and extracting entities, relations and attributes;
s300, carrying out information retrieval in a knowledge graph according to keywords and entities in the problem text to obtain related entities, relations and attributes;
s400, integrating information of a question text input by a user and the retrieved entity, relation and attribute to obtain a prompt;
s500, taking a prompt as input, and generating a corresponding answer based on an answer prediction model constructed by a large model technology;
and S600, carrying out formatted presentation on the generated answers based on the interface requirement of the user, wherein the presentation comprises presentation in the form of texts and charts.
The knowledge data collected in step S100 of this embodiment includes structured data and unstructured data, and for the structured data, relationships among entities, relationships, and attributes are extracted by means of entity modeling, relationship modeling, and triplet storage; for unstructured data, the relationships among entities, relationships and attributes are extracted by means of entity extraction and relationship extraction.
The knowledge graph constructed in this embodiment is stored in the graph database. In practical application, the entity can be extracted by other storage structure representations according to the need through the following method:
the entity extraction method adopted in step S200 is to perform regular matching based on rules to perform named entity recognition, or to treat named entity recognition as a sequence labeling problem based on a statistical model, where the statistical model includes a hidden markov model, a conditional markov model and a conditional random field model, or to implement end-to-end named entity recognition based on a neural network model with word vectors in a problem text as, and no longer depends on manually defined features.
Relationship extraction is the extraction of semantic relationships between two or more entities from text. The relationship extraction is closely related to entity extraction, and generally, after identifying entities in the text, the relationship possibly existing between the entities is extracted. The relation extraction is performed by a rule-based method, and the method comprises the following steps: semantic relationships between entities are extracted by identifying grammatical structures and contextual information in the question text using predefined rules and pattern matching techniques. The relation extraction is carried out by a machine learning-based method, comprising the following steps: the relationship extraction model is trained using supervised learning or unsupervised learning algorithms, and semantic relationships between entities are identified and extracted from the problem text based on the trained relationship extraction model.
Attributes are typically features, properties, or other descriptive information describing an entity, such as coordinates of a location, etc. In this embodiment, attribute extraction is performed by:
(1) Performing feature extraction on the problem text based on a rule matching method, supervised learning or semi-supervised learning or a deep learning method;
(2) Attributes are identified and extracted based on the extracted features, by a preconfigured classification model or a sequence annotation model.
The rule-based matching method comprises rule-based pattern matching and rule-based keyword matching. And when the deep learning method is used for extracting the characteristics of the problem text, the characteristic extraction is carried out on the problem text through the trained BERT model.
Step S300, information retrieval is carried out in the knowledge graph according to the keywords and the entities in the question text, and the related entities, relations, attributes and the like are found. In this embodiment, the knowledge graph is stored in the graph database, and when searching, the query language (such as Cypher) or other search algorithm of the graph database can be used to perform searching operation, and information segments related to question sentences are returned, where the information segments are understood to be entities, relationships and attributes related to keywords and entities in the question text.
Step S400 integrates the question text input by the user in step S200 and the information retrieved in step S300 to generate a prompt, and inputs the prompt into the large model. The template of the prompt may be designed according to practical situations, for example: "known information { here, information retrieved in step S300 }, from which a user' S question is answered concisely and professionally. If an answer cannot be obtained from the answer, please say "the question cannot be answered according to the known information" or "sufficient relevant information is not provided", the addition of the composition to the answer is not allowed, and the answer is made using Chinese. The problems are: { question of user in step S200 }).
Step 500 invokes the large model to generate an answer. The natural advantages of the large model, such as contextual understanding capability, multi-domain knowledge coverage, collar sample learning capability, language generation capability and the like, are utilized, and an answer is generated according to input prompt, so that chatgpt, chatglm, a religion and the like can be selected by the large model.
Step S600 presents the generated answer in a format according to the interface requirements of the user, for example, presents the answer to the user in text, a chart or other forms.
Example 2:
the application discloses a knowledge question-answering system construction system, which comprises a knowledge graph construction module, an extraction module, a retrieval matching module, an information integration module, an answer generation module and an answer display module, wherein the knowledge question-answering system in the chemical field is constructed by the method disclosed in the embodiment 1.
The knowledge graph construction module is used for collecting and arranging knowledge data related to the chemical field, preprocessing the knowledge data through a natural language processing technology, extracting the relation among the entities, the relation and the attribute, and constructing a knowledge graph based on the relation among the entities, the relation and the attribute, wherein the relation is a semantic relation among the entities, and the attribute is descriptive information for describing the entities, including describing the characteristics and the properties of the entities.
In this embodiment, the knowledge data includes structured data and unstructured data. For the structured data, the knowledge graph construction module is used for extracting the relation among the entity, the relation and the attribute in the modes of entity modeling, relation modeling and triplet storage; for unstructured data, the knowledge graph construction module is used for extracting the relation among the entity, the relation and the attribute in the way of entity extraction and relation extraction.
The extraction module is used for analyzing and understanding the problem text input by the user through natural language processing technology and extracting entities, relations and attributes.
The extraction module is used for extracting the entity to perform regular matching based on rules to perform named entity identification by the following method; or, identifying the named entity as a sequence labeling problem based on a statistical model, wherein the statistical model comprises a hidden Markov model, a conditional Markov model and a conditional random field model; or, the word vector in the question text is used as a word vector, and the end-to-end named entity recognition is realized based on the neural network model.
The module is used for relation extraction by a rule-based method or a machine learning-based method. The relation extraction is performed by a rule-based method, and the method comprises the following steps: semantic relationships between entities are extracted by identifying grammatical structures and contextual information in the question text using predefined rules and pattern matching techniques. The relation extraction is carried out by a machine learning-based method, comprising the following steps: the relationship extraction model is trained using supervised learning or unsupervised learning algorithms, and semantic relationships between entities are identified and extracted from the problem text based on the trained relationship extraction model.
The module is used for extracting the attribute through the following steps:
(1) Performing feature extraction on the problem text based on a rule matching method, supervised learning or semi-supervised learning or a deep learning method;
(2) Attributes are identified and extracted based on the extracted features, by a preconfigured classification model or a sequence annotation model.
The rule-based matching method comprises rule-based pattern matching and rule-based keyword matching; and when the deep learning method is used for extracting the characteristics of the problem text, the characteristic extraction is carried out on the problem text through the trained BERT model.
And the retrieval matching module is used for retrieving information in the knowledge graph according to the keywords and the entities in the problem text to obtain related entities, relations and attributes.
In this embodiment, the knowledge graph is stored in the graph database, and the search matching module is configured to perform the following steps: and according to the keywords and the entities in the problem text, retrieving from the knowledge graph through the query language of the graph database, and returning the entities, the relations and the attributes related to the problem text.
The information integration module is used for integrating information of the problem text input by the user and the retrieved entity, relationship and attribute to obtain the prompt.
The answer generation module is used for taking the prompt as input and generating a corresponding answer based on an answer prediction model constructed by a large model technology.
The answer display module is used for carrying out formatted presentation on the generated answer based on the interface requirement of the user, and the answer display module comprises display in the form of text and diagrams.
The natural advantages of the large model are utilized, such as contextual understanding capability, multi-domain knowledge coverage, collar sample learning capability, language generating capability and the like, and an answer is generated according to input prompt, and the answer prediction model in the embodiment is a model constructed based on chatgpt, chatglm or a text-to-speech.
While the application has been illustrated and described in detail in the drawings and in the preferred embodiments, the application is not limited to the disclosed embodiments, and it will be appreciated by those skilled in the art that the code audits of the various embodiments described above may be combined to produce further embodiments of the application, which are also within the scope of the application.

Claims (10)

1. The knowledge question-answering system construction method is characterized by constructing a knowledge question-answering system in the chemical field based on knowledge graphs, langchain and large model technology, and comprises the following steps:
collecting and arranging knowledge data related to the chemical field, preprocessing the knowledge data through a natural language processing technology, extracting the relationship among the entities, the relationship and the attributes, and constructing a knowledge graph based on the relationship among the entities, the relationship and the attributes, wherein the relationship is a semantic relationship among the entities, and the attributes are descriptive information for describing the entities, including describing the characteristics and the properties of the entities;
analyzing and understanding the problem text input by the user through a natural language processing technology, and extracting entities, relations and attributes;
according to the keywords and the entities in the problem text, information retrieval is carried out in the knowledge graph to obtain related entities, relations and attributes;
integrating information of the problem text input by the user and the retrieved entity, relation and attribute to obtain a prompt;
generating a corresponding answer based on an answer prediction model constructed by a large model technology by taking a prompt as an input;
based on the interface requirements of the user, the generated answers are formatted and presented, including being presented in the form of texts and charts.
2. The knowledge question-answering system construction method according to claim 1, wherein the knowledge data includes structured data and unstructured data;
for the structured data, extracting the relationship among the entity, the relationship and the attribute in the modes of entity modeling, relationship modeling and triplet storage;
for unstructured data, the relationships among entities, relationships and attributes are extracted by means of entity extraction and relationship extraction.
3. The knowledge question-answering system construction method according to claim 1, wherein the entity is extracted by:
performing regular matching based on rules to identify named entities;
or, treating the named entity recognition as a sequence labeling problem based on a statistical model, wherein the statistical model comprises a hidden Markov model, a conditional Markov model and a conditional random field model;
or, using word vectors in the problem text as a basis for realizing end-to-end named entity recognition based on the neural network model;
extracting the relation by a rule-based method or a machine learning-based method;
the relation extraction is carried out by a rule-based method, which comprises the following steps: extracting semantic relationships between entities by identifying grammatical structures and context information in the question text using predefined rules and pattern matching techniques;
the relation extraction is carried out by a machine learning-based method, comprising the following steps: training a relation extraction model by using a supervised learning or unsupervised learning algorithm, and identifying and extracting semantic relations between entities from the problem text based on the trained relation extraction model;
the attribute extraction is performed by the following steps:
performing feature extraction on the problem text based on a rule matching method, supervised learning or semi-supervised learning or a deep learning method;
identifying and extracting attributes based on the extracted features through a preconfigured classification model or a sequence annotation model;
the rule-based matching method comprises rule-based pattern matching and rule-based keyword matching;
when the deep learning method is used for extracting the characteristics of the problem text, the characteristic extraction is carried out on the problem text through the trained BERT model.
4. The knowledge question-answering system construction method according to claim 1, wherein the constructed knowledge graph is stored through a graph database;
and according to the keywords and the entities in the problem text, retrieving from the knowledge graph through the query language of the graph database, and returning the entities, the relations and the attributes related to the problem text.
5. The knowledge question and answer system construction method according to claim 1, wherein the answer prediction model is a model constructed based on chatgpt, chatglm or a text-to-speech.
6. A knowledge question-answering system construction system for constructing a knowledge question-answering system in a chemical field by the knowledge question-answering system construction method according to any one of claims 1 to 5, the construction system comprising:
the knowledge graph construction module is used for collecting and arranging knowledge data related to the chemical field, preprocessing the knowledge data through a natural language processing technology, extracting the relation among the entities, the relation and the attribute, and constructing a knowledge graph based on the relation among the entities, the relation and the attribute, wherein the relation is a semantic relation among the entities, and the attribute is descriptive information for describing the entities, including describing the characteristics and the properties of the entities;
the extraction module is used for analyzing and understanding the problem text input by the user through a natural language processing technology and extracting entities, relations and attributes;
the retrieval matching module is used for carrying out information retrieval in the knowledge graph according to the keywords and the entities in the problem text to obtain related entities, relations and attributes;
the information integration module is used for integrating the information of the problem text input by the user and the retrieved entity, relationship and attribute to obtain a prompt;
the answer generation module is used for taking a sample as input and generating a corresponding answer based on an answer prediction model constructed by a large model technology;
and the answer display module is used for carrying out formatted presentation on the generated answer based on the interface requirement of the user, and comprises display in the form of text and a chart.
7. The knowledge question-answering system construction system according to claim 6, wherein the knowledge data includes structured data and unstructured data;
for the structured data, the knowledge graph construction module is used for extracting the relationship among the entity, the relationship and the attribute in the modes of entity modeling, relationship modeling and triplet storage;
for unstructured data, the knowledge graph construction module is used for extracting the relation among the entities, the relation and the attributes in the way of entity extraction and relation extraction.
8. The knowledge question-answering system construction system according to claim 6, wherein the extraction module is configured to extract the entity by:
performing regular matching based on rules to identify named entities;
or, treating the named entity recognition as a sequence labeling problem based on a statistical model, wherein the statistical model comprises a hidden Markov model, a conditional Markov model and a conditional random field model;
or, using word vectors in the problem text as a basis for realizing end-to-end named entity recognition based on the neural network model;
the extraction module is used for extracting the relation by a rule-based method or a machine learning-based method;
the relation extraction is carried out by a rule-based method, which comprises the following steps: extracting semantic relationships between entities by identifying grammatical structures and context information in the question text using predefined rules and pattern matching techniques;
the relation extraction is carried out by a machine learning-based method, comprising the following steps: training a relation extraction model by using a supervised learning or unsupervised learning algorithm, and identifying and extracting semantic relations between entities from the problem text based on the trained relation extraction model;
the extraction module is used for extracting the attributes through the following steps:
performing feature extraction on the problem text based on a rule matching method, supervised learning or semi-supervised learning or a deep learning method;
identifying and extracting attributes based on the extracted features through a preconfigured classification model or a sequence annotation model;
the rule-based matching method comprises rule-based pattern matching and rule-based keyword matching;
when the deep learning method is used for extracting the characteristics of the problem text, the characteristic extraction is carried out on the problem text through the trained BERT model.
9. The knowledge question-answering system construction system according to claim 6, wherein the knowledge graph is stored in a graph database;
the search matching module is used for executing the following steps: and according to the keywords and the entities in the problem text, retrieving from the knowledge graph through the query language of the graph database, and returning the entities, the relations and the attributes related to the problem text.
10. The knowledge question and answer system construction system according to claim 6, wherein the answer prediction model is a model constructed based on chatgpt, chatglm or a text-to-speech.
CN202310765310.1A 2023-06-27 2023-06-27 Knowledge question-answering system construction method and system Pending CN117033571A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310765310.1A CN117033571A (en) 2023-06-27 2023-06-27 Knowledge question-answering system construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310765310.1A CN117033571A (en) 2023-06-27 2023-06-27 Knowledge question-answering system construction method and system

Publications (1)

Publication Number Publication Date
CN117033571A true CN117033571A (en) 2023-11-10

Family

ID=88634300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310765310.1A Pending CN117033571A (en) 2023-06-27 2023-06-27 Knowledge question-answering system construction method and system

Country Status (1)

Country Link
CN (1) CN117033571A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421416A (en) * 2023-12-19 2024-01-19 数据空间研究院 Interactive search method and device and electronic equipment
CN117436531A (en) * 2023-12-21 2024-01-23 安徽大学 Question answering system and method based on rice pest knowledge graph
CN117454884A (en) * 2023-12-20 2024-01-26 上海蜜度科技股份有限公司 Method, system, electronic device and storage medium for correcting historical character information
CN117520568A (en) * 2024-01-04 2024-02-06 北京奇虎科技有限公司 Knowledge graph attribute completion method, device, equipment and storage medium
CN117992069A (en) * 2024-04-07 2024-05-07 北京惠每云科技有限公司 Code quality control method and device based on large language model

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421416A (en) * 2023-12-19 2024-01-19 数据空间研究院 Interactive search method and device and electronic equipment
CN117421416B (en) * 2023-12-19 2024-03-26 数据空间研究院 Interactive search method and device and electronic equipment
CN117454884A (en) * 2023-12-20 2024-01-26 上海蜜度科技股份有限公司 Method, system, electronic device and storage medium for correcting historical character information
CN117454884B (en) * 2023-12-20 2024-04-09 上海蜜度科技股份有限公司 Method, system, electronic device and storage medium for correcting historical character information
CN117436531A (en) * 2023-12-21 2024-01-23 安徽大学 Question answering system and method based on rice pest knowledge graph
CN117520568A (en) * 2024-01-04 2024-02-06 北京奇虎科技有限公司 Knowledge graph attribute completion method, device, equipment and storage medium
CN117992069A (en) * 2024-04-07 2024-05-07 北京惠每云科技有限公司 Code quality control method and device based on large language model

Similar Documents

Publication Publication Date Title
CN111026842B (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN107679039B (en) Method and device for determining statement intention
CN117033571A (en) Knowledge question-answering system construction method and system
US20170286835A1 (en) Concept Hierarchies
Kheiri et al. Sentimentgpt: Exploiting gpt for advanced sentiment analysis and its departure from current machine learning
US20170161619A1 (en) Concept-Based Navigation
Kim et al. SAO2Vec: Development of an algorithm for embedding the subject–action–object (SAO) structure using Doc2Vec
CN112287090A (en) Financial question asking back method and system based on knowledge graph
CN112528654A (en) Natural language processing method and device and electronic equipment
CN117271724A (en) Intelligent question-answering implementation method and system based on large model and semantic graph
Peng et al. Image to LaTeX with graph neural network for mathematical formula recognition
Dewi et al. Shapley additive explanations for text classification and sentiment analysis of internet movie database
Tadejko Cloud cognitive services based on machine learning methods in architecture of modern knowledge management solutions
Albesta et al. The impact of sentiment analysis from user on Facebook to enhanced the service quality.
Rajanak et al. Language detection using natural language processing
Zhang et al. Modeling the relationship between user comments and edits in document revision
CN114417008A (en) Construction engineering field-oriented knowledge graph construction method and system
Rafi et al. A linear sub-structure with co-variance shift for image captioning
Zishumba Sentiment Analysis Based on Social Media Data
Cuadrado et al. team UTB-NLP at finances 2023: financial targeted sentiment analysis using a phonestheme semantic approach
Rybak et al. Machine Learning-Enhanced Text Mining as a Support Tool for Research on Climate Change: Theoretical and Technical Considerations
Bharadi Sentiment Analysis of Twitter Data Using Named Entity Recognition
CN117743315B (en) Method for providing high-quality data for multi-mode large model system
Sharma et al. Detecting anomalies, contradictions, and contextual analysis through NLP in text
Deelip et al. Analysis of Twitter Data for Prediction of Iphone X Reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination