CN118093834B

CN118093834B - AIGC large model-based language processing question-answering system and method

Info

Publication number: CN118093834B
Application number: CN202410479542.5A
Authority: CN
Inventors: 朱志强
Original assignee: Bonning Digital Technology Co ltd
Current assignee: Bonning Digital Technology Co ltd
Priority date: 2024-04-22
Filing date: 2024-04-22
Publication date: 2024-08-02
Anticipated expiration: 2044-04-22
Also published as: CN118093834A

Abstract

The invention relates to the technical field of language processing, in particular to a AIGC large model-based language processing question-answering system and method, comprising the following steps: receiving natural language questions input by a user, and extracting key information through grammar analysis and semantic understanding technology; inputting the extracted key information into a AIGC-based language model, and generating a series of answer candidates by using a AIGC large model according to the input information and the enhanced domain knowledge through domain adaptability enhancement processing; evaluating answer candidates to select an optimal answer; and outputting the optimal answer to the user in the form of natural language. The invention obviously enhances the adaptability and the processing capacity of AIGC large models to the problems in the specific field. The adaptability reinforcement not only improves the application range of the question-answering system in various professional fields, but also enhances the flexibility and accuracy of the question-answering system in the face of new fields or cold questions.

Description

AIGC large model-based language processing question-answering system and method

Technical Field

The invention relates to the technical field of language processing, in particular to a AIGC large model-based language processing question-answering system and method.

Background

In the current state of the art, significant advances have been made in the field of artificial intelligence and Natural Language Processing (NLP), particularly in terms of language understanding and generation. AIGC (artificial intelligence generation content) technology, particularly large pre-trained language models, have demonstrated great capability in multiple language processing tasks that can understand complex language constructs, contextual meanings, and perform multiple language-based tasks such as text classification, emotion analysis, text summarization, question-answering, and the like.

Nonetheless, existing language processing question-answering systems still face some key challenges. One of these is how to effectively understand and answer those cold questions that relate to a particular area (e.g., medical, legal, or technological, etc.), which often contain terms of art and complex concepts, requiring the system to have in-depth domain knowledge and understanding capabilities. In addition, the prior art also has limitations in terms of the variety of answers generated, naturalness, and user interaction.

Furthermore, despite the extensive knowledge coverage of large models, they are still limited in terms of adaptability and flexibility in particular fields. For example, a generic language model trained on extensive data may have difficulty accurately handling cold terms and problems that only occur in certain specialized areas. Therefore, improving the performance of the model in a specific field, and improving the accuracy, relevance and naturalness of the answer, has become an important point of research and development.

In summary, while existing AIGC techniques and language models achieve significant achievements in processing a wide range of language tasks, improvements in domain-specific questions and answers, answer quality optimization, and user interaction experience remain to be achieved. Therefore, the language processing question-answering method capable of effectively integrating domain knowledge, improving answer generation quality and optimizing user interaction is developed, and has important significance for promoting further development of language processing technology.

Disclosure of Invention

Based on the above purpose, the invention provides a AIGC large model-based language processing question-answering system and method.

A AIGC large model-based language processing question-answering method comprises the following steps:

S1: receiving natural language questions input by a user, and extracting key information through grammar analysis and semantic understanding technology;

S2: inputting the extracted key information into a AIGC-based language model, and generating a series of answer candidates by using a AIGC large model according to the input information and the enhanced domain knowledge through domain adaptability enhancement processing;

S3: evaluating answer candidates to select an optimal answer;

S4: and outputting the optimal answer to the user in the form of natural language.

Further, the S1 specifically includes:

S11, receiving: receiving a natural language question input by a user through a user interface, wherein the user interface supports two modes of text input and voice input;

S12, pretreatment: preprocessing questions entered by the user, including removing irrelevant characters, correcting spelling errors, converting speech input to text (if the first time speech input);

S13, grammar analysis: analyzing the questions by using natural language processing technology, and identifying sentence structures including sentence components of subjects, predicates and objects;

S14, semantic understanding: carrying out semantic analysis on the problem through a deep learning model and a natural language understanding algorithm, and understanding the intention and the contextual meaning of the problem;

S15, extracting key information: based on the results of the grammar analysis and semantic understanding, extracting key information in the problem, wherein the key information comprises:

Key words: the main nouns, verbs, and adjectives in a question refer to words of a particular concept, object, or action;

entity identification: specific entities mentioned in the question include name, place, organization, date;

relationship and attributes: relationships between entities implied in the problem and related attributes and features;

Question type: the type of question is determined based on the structure and wording of the question, including a factual query, an interpretation request, or an operation guide.

Further, the domain adaptability enhancement processing in S2 specifically includes:

s21: determining a specific field to which the problem belongs by using a field recognition algorithm, and extracting a problem and a term library related to the specific field;

S22: matching cold terms and concepts in the problem with nodes in the map through a domain-specific knowledge map constructed by cooperation with a specific domain expert so as to understand the deep meaning and the context relationship of the cold terms and concepts;

s23: the AIGC large model is adjusted in real time by combining the context of the problem and the domain knowledge graph so as to enhance the processing capability of the model on the cold problem and the technical term;

S24: the adjusted question representations and domain knowledge are input as enhancement information into AIGC-based language models in preparation for generating more accurate and specialized answers.

Further, the step S21 specifically includes:

Feature extraction: extracting language features from the user problems, including word frequency, part-of-speech tagging, semantic role tagging and context embedding vectors, wherein the features can comprehensively reflect the language characteristics and deep semantics of the problems;

Domain feature vectorization: converting extracted features into domain feature vectors Wherein each dimension represents a numerical representation of a language feature associated with the domain;

Domain similarity calculation: computing problem feature vectors using domain identification algorithms And a predefined set of domain vectors(Per-domain vector)Feature vectors representing a particular domain), similarity calculation:；

wherein, The dot product of the representative vector is calculated,AndRespectively are vectorsAndEuclidean norms of (c);

the domain determination, namely selecting the domain corresponding to the domain vector with the highest similarity as the specific domain to which the problem belongs;

Term library extraction: according to the determined field, extracting professional questions and term libraries related to the field from a database, wherein the professional questions and term libraries comprise key terms, definitions, common questions and answer information of the field.

Further, the step S22 specifically includes:

constructing a knowledge graph: in cooperation with domain experts, constructing a knowledge graph containing important concepts, terms, entities and interrelationships thereof in the domain, wherein each node represents a concept or entity in the domain, and edges between the nodes represent the relationship between the concepts or entities;

Cold door term identification: analyzing user problems through natural language processing technology, and identifying cold terms and concepts in the problems, wherein the cold terms refer to words with low occurrence frequency in a corpus but specific meaning in specific fields;

Term map mapping: mapping the identified cold terms and concepts with nodes in the knowledge graph, wherein a matching algorithm based on semantic similarity is adopted in the mapping process, and semantic features of the terms and attributes of the graph nodes are considered to determine the best matching node;

contextual relationship resolution: analyzing the context relation between the cold term and the concept in the problem by utilizing edges in the knowledge graph, and revealing the effect and meaning of the cold term in the specific problem by analyzing other nodes connected with the matching nodes and the relation types thereof;

Deep meaning understanding: and analyzing the deep meaning of cold terms and concepts by comprehensively using the structural information of the atlas and the contextual relation of the terms.

Further, the step S23 specifically includes:

S231, context and domain knowledge integration: integrating the context information of the problem and the deep meaning and relation of cold terms and related concepts thereof obtained through the domain knowledge graph into an enhanced feature representation, wherein the enhanced feature representation comprises the original semantic information of the problem and the deep knowledge of the specific domain;

S232, feature conversion: the integrated feature representation is converted into a form suitable for AIGC large models using a self-encoder algorithm, the encoder being represented as: Wherein, the method comprises the steps of, wherein, Is an input feature that is used to determine the input,Is the weight of the encoder and,Is a bias term that is used to determine,Is the function of the activation and,Is the generated hidden layer representation (i.e., encoding); the decoder is expressed as: Wherein, the method comprises the steps of, wherein, Is the weight of the decoder and,Is a bias term that is used to determine,Is the function of the activation and,Is the input for reconstruction, the goal of the self-encoder is to minimize the inputAnd reconstructing the inputThe difference between, using the loss function: training a self-encoder to minimize a loss function, learning a compressed representation of the input data, the compressed representation being used for feature transformation;

S233, model adjustment: based on the converted characteristic representation, parameters of the AIGC large model are adjusted in real time, the adjustment process adopts transfer learning, so that the AIGC large model is suitable for the background and semantic requirements of the specific field of the current problem, and the transfer learning process is as follows:

Pre-training a model on a source task, and learning a representation of source domain data;

migrating a portion of the pre-trained model (e.g., feature extraction layer) to a target task;

Trimming the migrated model portion on the target domain data while maintaining or trimming other portions;

S234, enhanced processing power verification: and whether the processing capacity of the model to the cold door problem and the technical term is obviously enhanced after the model is adjusted is checked through a preset verification mechanism, so that the adjustment effect is ensured to accord with expectations.

Further, the AIGC big model in S2 specifically includes:

enhancement information integration: integrating the adjusted representation of the user question and domain knowledge into an enhanced information set comprising adjusted question features, domain specific terms, concepts and their interrelationships;

context-aware coding: processing the enhanced information set with an encoder to capture complex relationships between deep semantic features of the problem and domain knowledge, the encoder outputting a high-dimensional feature representation of the comprehensive problem context and domain knowledge;

answer generation: the encoded high-dimensional characteristic representation is input into a AIGC large-model decoder, and the decoder generates a series of answer candidates through a sequence generation mechanism on the basis of considering the problem context and the domain knowledge by utilizing the high-dimensional characteristic representation.

Further, in S3, a Beam Search (Beam Search) is used to evaluate answer candidates and to make the generated answers both diversified and highly relevant, the Beam Search specifically includes:

initializing: setting beam width At the beginning of decoding, a size is initialized toEach candidate comprising a partial solution sequence with only a start tag (e.g., < start >);

and (3) iteration expansion: in each iteration, for each partial solution sequence in the bundle, the next vocabulary (or token) and its probability are predicted, and for each partial solution, the highest probability is selected Word, combine with this partial solution, form the new partial solution sequence;

Calculating the score: the score of each newly generated partial solution sequence is calculated by accumulating the logarithmic probabilities of its constituent words as follows:

Wherein, the method comprises the steps of, wherein, Is a partial de-sequence of the sequence,Is the first in the sequenceThe number of words to be used in the method,Is given aboveAnd context(I.e., problem representation and domain knowledge), vocabularyIs a function of the conditional probability of (1),Is the number of words in the sequence;

selecting and reserving: after each iteration, the highest score is selected from all newly generated partial solutions Partial decomposition is added into the beam for the next round of iterative expansion;

termination condition: the iterative process continues until a predefined maximum length is reached, or a partial solution sequence in the bundle ends with an end mark (e.g., < end >);

the highest scoring sequence is selected from the final bundle as the answer candidate, and in the case where multiple answer candidates are required, the top ranking sequence is selected.

Further, the step S4 further includes selecting a formatting scheme, adding context information to the answer, highlighting or emphasizing key information in the answer according to the content and type of the answer, including using bold, italics or color change to attract the user' S attention to the important part.

A AIGC large model-based language processing question-answering system for realizing the above-mentioned language processing question-answering method based on AIGC large model, comprising the following modules:

A user interface module: the module is responsible for receiving natural language questions input by a user and supporting text and speech form input of the questions, and presenting the final answers to the user in a natural, user-friendly manner;

problem understanding module: carrying out grammar analysis and semantic understanding on the problem input by the user by using a natural language processing technology, and extracting key information of the problem, wherein the key information comprises key words, entities, relations and problem types;

Domain adaptability enhancement processing module: the method comprises a domain identification sub-module, a domain knowledge graph matching sub-module and a domain adaptive algorithm sub-module, wherein the domain identification sub-module, the domain knowledge graph matching sub-module and the domain adaptive algorithm sub-module are used for determining the specific domain to which the problem belongs, matching related concepts in the domain knowledge graph and adjusting AIGC large models in real time;

Answer generation module: generating a series of answer candidates according to the context and domain knowledge of the questions by using AIGC large models subjected to domain adaptability enhancement processing, and optimizing an answer generation process by adopting a beam search algorithm;

Answer evaluation and selection module: and comprehensively evaluating answer candidates, including content overlapping measurement, semantic similarity measurement, language fluency check and grammar correctness verification, so as to select an optimal answer.

The invention has the beneficial effects that:

According to the method, through the combination of the field adaptability enhancement processing and the field knowledge graph, the query related to the cold questions and the special terms in the specific field can be accurately understood and answered, and the process not only enhances the understanding of the model on the deep meaning of the questions, but also ensures the accuracy and the high correlation of the answers, so that the requirements of users in the special field are met.

According to the method, through field adaptability enhancement processing, the cold term and the complex concept in the specific field can be deeply understood, the expertise and the accuracy of the answer are ensured, the deep understanding enables the system to process and answer the expertise field problems which are difficult to accurately capture by the traditional language model, and the adaptability and the processing capacity of the AIGC large model to the specific field problems are remarkably enhanced by utilizing the field knowledge graph and the real-time adjustment mechanism. The adaptability reinforcement not only improves the application range of the question-answering system in various professional fields, but also enhances the flexibility and accuracy of the question-answering system in the face of new fields or cold questions.

According to the invention, through the beam search algorithm, the method can select the best answer from a wide range of candidate answers. The selection mechanism performs comprehensive scoring based on the relevance and naturalness of the answers, ensures that the answers finally presented to the user are not only highly relevant to the questions, but also the language expression is smooth and natural, and the beam search algorithm ensures the diversity of the answers by keeping a plurality of optimal candidate solutions in each step. This diversity is particularly important for dealing with open questions with multiple possible answers, and can provide more comprehensive information to meet the needs of different users.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method according to an embodiment of the invention;

fig. 2 is a schematic diagram of a system module according to an embodiment of the invention.

Detailed Description

The present invention will be further described in detail with reference to specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent.

It is to be noted that unless otherwise defined, technical or scientific terms used herein should be taken in a general sense as understood by one of ordinary skill in the art to which the present invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

As shown in FIG. 1, a method for processing questions and answers in a language based on AIGC big models comprises the following steps:

S3: evaluating answer candidates to select an optimal answer;

S1 specifically comprises:

The field adaptability enhancing process in S2 specifically includes:

The enhancement information is input to a AIGC-based language model.

Model input adjustment: integrating the enhancement information encoded representation into the input of the model according to the input requirements of the AIGC language model requires adjusting the input layer of the model to accept the new enhancement information vector as additional input.

Context information integration: in the decoding stage of the model, the enhancement information is used as additional context information to guide the generation of the answer, and the generation is realized by modifying the attention mechanism of the model, so that the context and domain knowledge provided by the enhancement information are considered by the model when the answer is generated.

Training and fine tuning: finally, it is desirable to train or fine tune AIGC the model on the dataset containing the enhancement information to accommodate the new input formats and information, ensuring that the model can effectively utilize the enhancement information to generate more accurate and relevant answers.

S21 specifically comprises:

wherein, The dot product of the representative vector is calculated,AndRespectively are vectorsAndThe Euclidean norm of (2), the formula measures the included angle between the problem feature vector and each field vector in the vector space, and the smaller the included angle is, the higher the similarity is;

S22 specifically comprises the following steps:

Constructing a knowledge graph: in cooperation with domain experts, constructing a knowledge graph containing important concepts, terms, entities and their interrelationships within a domain, each node representing a concept or entity within a domain, and edges between nodes representing the relationship between concepts or entities, such as "is a", "belongs to", "is related to";

Deep meaning understanding: and analyzing the deep meaning of cold terms and concepts by comprehensively using the structural information of the atlas and the contextual relation of the terms, helping to fully understand the questions and providing support for generating accurate and relevant answers.

S23 specifically comprises the following steps:

S232, feature conversion: the integrated feature representation is converted into a form suitable for AIGC large models using a self-encoder algorithm, which is an unsupervised neural network for efficient encoding of learning data. The basic structure includes an encoder that converts input data into a lower dimensional code from which the decoder attempts to reconstruct the input data, and a decoder that represents: Wherein, the method comprises the steps of, wherein, Is an input feature that is used to determine the input,Is the weight of the encoder and,Is a bias term that is used to determine,Is the function of the activation and,Is the generated hidden layer representation (i.e., encoding); the decoder is expressed as: Wherein, the method comprises the steps of, wherein, Is the weight of the decoder and,Is a bias term that is used to determine,Is the function of the activation and,Is the input for reconstruction, the goal of the self-encoder is to minimize the inputAnd reconstructing the inputThe difference between, using the loss function: training a self-encoder to minimize a loss function, learning a compressed representation of the input data, the compressed representation being used for feature transformation;

S233, model adjustment: based on the converted characteristic representation, parameters of the AIGC large model are adjusted in real time, and the adjustment process adopts transfer learning, so that the AIGC large model is adapted to the specific field background and semantic requirements of the current problem, and the transfer learning is a technology for improving the learning effect on another related task by using knowledge learned on one task. In the migration learning, there is generally one source task and one target task, and a corresponding source domain data set and target domain data set, and the migration learning process is as follows:

The verification mechanism employs cross-validation or simulated problem testing.

The AIGC big model in S2 specifically includes:

Answer generation: the encoded high-dimensional feature representation is input into a AIGC large model decoder, the decoder generates a series of answer candidates through a sequence generation mechanism based on consideration of the problem context and domain knowledge by using the high-dimensional feature representation, and the decoder can generate an answer sequence through a self-attention and cross-attention mechanism based on the structure of a transducer.

In S3, evaluating answer candidates by using a Beam Search (Beam Search), and diversifying and highly correlating generated answers to help avoid generating highly repeated answers while ensuring quality and correlation of the answers, and performing post-processing and optimization on the generated answer candidates, including grammar correction, semantic consistency check, and domain knowledge verification, to improve accuracy and expertise of the answers, the Beam Search specifically includes:

S4 further includes selecting a formatting scheme according to the content and type of the answer, for example, if the answer is a list (e.g., step, option, etc.), then it is presented in a list form; if the answer contains date, number or specific data, it is ensured that the format of this information is standardized and easy to read;

adding context information to the answer, allowing the user to understand the answer even without seeing a complete question-answer history, which may include brief question repetition, introducing background information of the answer, or interpreting specific terms;

Highlighting or emphasizing key information in the answer includes using bolded, italic, or color changes to draw the user's attention to important parts.

As shown in fig. 2, a language processing question-answering system based on AIGC big models is used for implementing the above-mentioned language processing question-answering method based on AIGC big models, and includes the following modules:

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the invention is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

The present invention is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.

Claims

1. A language processing question-answering method based on AIGC big models is characterized by comprising the following steps:

S2: inputting the extracted key information into a AIGC-based language model, and generating a series of answer candidates by using a AIGC large model according to the input information and enhanced domain knowledge through domain adaptability enhancement processing, wherein the domain adaptability enhancement processing specifically comprises the following steps:

s24: inputting the adjusted question representation and domain knowledge as enhancement information into a AIGC-based language model to prepare for generating more accurate and specialized answers;

The step S21 specifically comprises the following steps:

Feature extraction: extracting language features from the user problems, including word frequency, part-of-speech tagging, semantic role tagging and context embedding vectors;

Domain similarity calculation: computing problem feature vectors using domain identification algorithms And a predefined set of domain vectorsSimilarity between the two, similarity calculation:；

term library extraction: extracting professional questions and a term library related to the field from a database according to the determined field, wherein the professional questions and the term library comprise key terms, definitions, common questions and answer information thereof in the field;

the step S22 specifically includes:

deep meaning understanding: analyzing the deep meaning of cold terms and concepts by comprehensively using the structural information of the atlas and the context relation of the terms;

The step S23 specifically comprises the following steps:

S232, feature conversion: the integrated feature representation is converted into a form suitable for AIGC large models using a self-encoder algorithm, the encoder being represented as: Wherein, the method comprises the steps of, wherein, Is an input feature that is used to determine the input,Is the weight of the encoder and,Is a bias term that is used to determine,Is the function of the activation and,Is the generated hidden layer representation; the decoder is expressed as: Wherein, the method comprises the steps of, wherein, Is the weight of the decoder and,Is a bias term that is used to determine,Is the function of the activation and,Is the input for reconstruction, the goal of the self-encoder is to minimize the inputAnd reconstructing the inputThe difference between, using the loss function: training a self-encoder to minimize a loss function, learning a compressed representation of the input data, the compressed representation being used for feature transformation;

migrating a portion of the pre-trained model to a target task;

S234, enhanced processing power verification: whether the processing capacity of the model to the cold door problem and the technical term is obviously enhanced after the model is adjusted is checked through a preset verification mechanism, so that the adjustment effect is ensured to accord with expectations;

The AIGC big model in S2 specifically includes:

Answer generation: inputting the encoded high-dimensional characteristic representation into a AIGC large-model decoder, and generating a series of answer candidates by using a sequence generation mechanism by using the high-dimensional characteristic representation on the basis of considering the problem context and the domain knowledge;

S3: evaluating answer candidates to select an optimal answer, evaluating the answer candidates using a bundle search, which specifically includes:

initializing: setting beam width At the beginning of decoding, a size is initialized toEach candidate comprising a partial solution sequence with only a start tag;

And (3) iteration expansion: in each iteration, for each partial solution sequence in the bundle, the next vocabulary and its probability are predicted, and for each partial solution, the highest probability is selected Word, combine with this partial solution, form the new partial solution sequence;

Wherein, the method comprises the steps of, wherein, Is a partial de-sequence of the sequence,Is the first in the sequenceThe number of words to be used in the method,Is given aboveAnd contextWhen the words areIs a function of the conditional probability of (1),Is the number of words in the sequence;

termination condition: the iterative process continues until a predefined maximum length is reached, or a partial solution sequence in the bundle ends with an end-marker;

selecting the sequence with the highest score from the final bundle as an answer candidate, and selecting the sequence ranked at the front when a plurality of answer candidates are needed;

2. The method for processing questions and answers in a language based on AIGC big models of claim 1, wherein S1 specifically comprises:

S12, pretreatment: preprocessing the problem input by the user, including removing irrelevant characters, correcting spelling errors, and converting voice input into text;

3. A language processing question-answering method based on a AIGC large model according to claim 1, in which S4 further comprises selecting a formatting scheme, adding context information to the answer, and highlighting or emphasizing key information in the answer, including using bold, italics, or color change to draw the user' S attention to important parts, according to the content and type of answer.

4. A language processing question-answering system based on AIGC big models for implementing a language processing question-answering method based on AIGC big models according to any one of claims 1-3, comprising the following modules:

a user interface module: the method is responsible for receiving natural language questions input by a user and supporting text and voice form input of the questions;