CN117725172A

CN117725172A - Multi-intention recognition supporting large model QA question-answering method and device

Info

Publication number: CN117725172A
Application number: CN202311668450.3A
Authority: CN
Inventors: 纪智辉; 李伟
Original assignee: 4u Beijing Technology Co ltd
Current assignee: 4u Beijing Technology Co ltd
Priority date: 2023-12-06
Filing date: 2023-12-06
Publication date: 2024-03-19

Abstract

The application provides a multi-intention recognition supporting large model QA question-answering method and device, wherein the method comprises the following steps: in response to receiving a question entered by a user, multi-intent recognition is performed on the question, and intent recognized by the multi-intent is added to the question; narrowing the answer range for the question with respect to the intent identified by the multi-intent, and determining a final question intent based on the narrowed answer range; based on the final question intent, a question response for the question is generated. The method and the device solve the technical problem that the existing traditional question-answering model cannot accurately realize the content of the model reply which is wanted by the problem.

Description

Multi-intention recognition supporting large model QA question-answering method and device

Technical Field

The application relates to the technical field of AI (advanced technology attachment), in particular to a multi-intention recognition supporting large model QA question-answering method and device.

Background

Traditional question-answering models are typically built based on machine learning or deep learning methods. In these models, it is common practice to use structures such as Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs) to extract features from questions and answers, and to convert them into a representation that is understandable to the model, such as a word embedding vector. The model then trains the dataset through extensive questions and answers, attempting to minimize the difference between the predicted answer and the actual answer. However, since these models may ignore the overall context when dealing with the problem, there is a certain defect in understanding the intention of the problem, and the problem of breaking sense is easily generated.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment provides a multi-purpose recognition supporting large model QA question-answering method and device, which at least solve the technical problem that the existing traditional question-answering model cannot accurately realize the content of the question which the model is intended to answer.

According to an aspect of the embodiment, there is provided a multi-intention recognition support large model QA question-answering method including: in response to receiving a question entered by a user, multi-intent recognition is performed on the question, and intent recognized by the multi-intent is added to the question; narrowing the answer range for the question with respect to the intent identified by the multi-intent, and determining a final question intent based on the narrowed answer range; based on the final question intent, a question response for the question is generated.

According to another aspect of the embodiment, there is also provided a multi-intention recognition support large model QA question-answering apparatus including: an identification module configured to identify multiple intents for a question in response to receiving a user input, and to add the intent identified by the multiple intents to the question; a determining module configured to narrow down an answer range for the question for the intent identified by the multiple intent, and determine a final question intent based on the narrowed-down answer range; a generation module configured to generate a question response for the question based on the final question intent.

In this embodiment, in response to receiving a question input by a user, multi-intent recognition is performed on the question, and intent recognized by the multi-intent is added to the question; narrowing the answer range for the question with respect to the intent identified by the multi-intent, and determining a final question intent based on the narrowed answer range; based on the final question intent, a question response for the question is generated. Through the scheme, the technical problem that the existing traditional question-answering model cannot accurately realize the content which the question itself wants to answer by the model is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a multi-intent recognition support large model QA question-answering method in accordance with embodiments of the present application;

FIG. 2 is a flow chart of another multi-intent recognition support large model QA question-answering method according to embodiments of the present application;

FIG. 3 is a flow chart of a model training method according to an embodiment of the present application;

FIG. 4 is a flow chart of a method of replying to real-time class problems according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a multi-intent recognition support large model QA question-answering device according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a real-time problem-like recovery device according to an embodiment of the present application;

fig. 7 shows a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description. Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail. In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Example 1

The embodiment of the application provides a multi-intention recognition supporting large model QA question-answering method, which comprises the following steps as shown in fig. 1:

step S102, responding to the received user input problem, performing multi-intention recognition on the problem, and adding the intention recognized by the multi-intention into the problem.

Obtaining sample data, the sample data comprising multi-intent data and single-intent data; word segmentation is carried out on sentence data corresponding to the sample data, and segmented word groups are obtained; and adding a preset mark in the segmented word group, wherein the mark is used for marking the segmentation mode. Adding a preset label in the segmented word group, wherein the label comprises single intention and multiple intention; training the model based on the labels, wherein the model classifies the labels to determine final problem intent. The sample data comprises an online QA question and answer pair and an intention label for the online QA question and answer pair.

In some embodiments, abstract definition can be further performed on single intents to determine multiple intents, the number of the multiple intents can be dynamically adjusted, only minimum unit double intents are given in the sample, and the table can be modified in training to adjust the number of the multiple intents.

Specifically, a loss function of the model is determined by using binary cross entropy, and the model is trained by using the loss function; and running the models in batches, and verifying the final performance result of the models through the F1 score. This example uses the binary_cross sentropy penalty to measure the feasibility of the model, with a final penalty of 0.141. Batch runs were also performed on the model at the end to verify the final model's result score by f1_score, with the final f1_score at 0.913.

The present embodiment makes multi-intent recognition of a question entered by a user, and adds the recognized intent to the question. In this way, the intent of the question is better defined by adding multiple intents to the question, so that the large model can more accurately understand the meaning of the question before asking. By the method, accuracy and instantaneity of the question-answering model are improved, and accordingly defects of the traditional question-answering model in understanding the problem context are overcome. The whole method comprises the steps of sample data processing, model training, loss function adoption and performance verification, and is developed around multi-purpose recognition, so that the problem of breaking the rules and meaning of a traditional question-answering model is effectively solved, and the quality of a question-answering system is improved. In a word, the multi-intention recognition is added before the question answering of the large model to define the meaning of the problem, so that the model can better hit the meaning which the problem wants to express, and the accuracy and the instantaneity of the question answering of the model are improved.

Step S104, narrowing the answer range of the question aiming at the intent recognized by the multiple intents, and determining the final question intent based on the narrowed answer range.

Converting the text of the segmented word group into vector representation; comparing the vector representation with intent categories in a predefined library of intent categories using cosine similarity; based on the result of the comparison, the labels of the segmented word groups are classified to determine a final problem intent.

The present embodiment refines the context of the question by purposefully narrowing the answer, based on the identified multi-intents, thereby more accurately determining the intent of the final question. Through the process, the method effectively improves the accuracy of problem analysis, so that the question-answering model is focused on a specific problem background, and a more reliable basis is provided for generating a final problem response. This strategy of narrowing the answer range helps to reduce the ambiguity of the model in complex contexts, further enhancing the accuracy and practicality of the question-answering system.

And step S106, generating a question response of the question based on the final question intention.

The existing traditional question-answering model can not accurately realize the content that the question itself wants the model to answer, and the model is often broken. The embodiment provides a multi-intention recognition component facing to the traditional question-answering service, which is matched with a traditional large model so as to improve the question-answering quality.

Example 2

The embodiment of the application provides another multi-intention recognition supporting large model QA question-answering method, as shown in fig. 2, which comprises the following steps:

step S202, collecting multi-intention/single-intention data.

The data source comprises an online QA question-answer pair, and the online QA question-answer pair is subjected to intention labeling. By screening and selecting existing pairs of questions and answers, diversity and representativeness of the data set is ensured. Then, the QA question-answer pairs are subjected to intention labeling, namely corresponding intention is marked for each question. The labeling of the intent may be single, representing a single intent, or multiple, representing multiple intents.

Step S204, acquiring question and answer information, and performing word segmentation on the question sentence data.

Question sentence data is extracted from the collected pair of QA questions and answers. The question sentences are segmented into word groups, and each question is divided into a word group to prepare for subsequent processing and analysis.

Step S206, adding a preset token in the segmented word group.

And adding a preset token into the word group obtained after the segmentation of the problem statement. These token may be to identify the beginning and end of a sentence, or to identify a particular portion or structure of a question. This helps the model better understand the context and structure of the problem.

Step S208, special definition is carried out on the table.

For example, a single intent [ query weather [1,0], query date [0,1] ] multiple intent [ query weather and date [1,1 ].

Step S210, model training.

And determining a loss function of the model by using the binary cross entropy, and training the model through the loss function. The model was run in bulk and the performance results of the final model were verified, with the F1 score being used to evaluate the performance of the model.

In the embodiment, a Tensorflow framework is adopted, BERT is used for realizing conversion of the Embedding, LSTM is used for training and evaluating results, and an optimizer is used as adam.

Specifically, the method for training the model comprises the following steps as shown in fig. 3:

step S2102, a model is constructed.

The present embodiment uses a Tensorflow framework to build a deep learning model. The architecture of the model includes an Embedding Layer (LSTM Layer), an output Layer, etc. The Embedding Layer is used for converting text data into vectors, and the LSTM Layer is used for processing sequence information. Tensorflow is a framework widely used for machine learning and deep learning, which provides highly flexible tools and interfaces that facilitate building, training and evaluating complex neural network models. BERT is a pre-trained natural language processing model that preserves contextual information by mapping text to a high-dimensional vector space. This helps to improve the model's understanding of the problem. The long-short time memory network LSTM is a deep learning model suitable for sequence data. The LSTM can effectively capture sequence information in the text, and the semantics of the problem statement can be better understood.

Specifically, the construction model comprises the following steps:

1) The process of Embedding (BERT Embedding):

input sequence: x= (X ₁ ，x ₂ ，...，x _T )

Output of the Embedding layer: e= (E) ₁ ，e ₂ ，...，e _T )

Wherein E represents an assembled vector sequence, each x _i Is corresponding to e _i Is used to determine the embedded vector of (a).

2) LSTM layer operation.

For one cell (cell) of LSTM:

i _t ＝σ(W _ii x _t +b _ii +U _ii h _t-1 +c _ii )

f _t ＝σ(W _if x _t +b _if +U _if h _t-1 +c _if )

g _t ＝tanh(W _ig x _t +b _ig +U _ig h _t-1 +c _ig )

o _t ＝σ(W _io x _t +b _io +U _io h _t-1 +c _io )

wherein i is _t ，f _t ，g _t ，o _t The activation values of the input gate, the forget gate, the cell state update and the output gate are respectively represented, x _t Is the input of the current time step, h _t-1 Is the hidden state of the last time step, W and U are weights of the embedded layer and LSTM layer, and b and c are the corresponding biases.

3) Output of LSTM layer:

h _t ＝o _t ·tanh(C _t )

where Ct is the cell state of the current time step.

4) Model output layer (multi-label classification):

the output layer adopts a Sigmoid activation function to perform two classifications on the output of each label:

y _i ＝σ(V _i x+d _i )

here, y _i Representing the output of the ith tag, vi and di are the corresponding weights and biases, and σ (sigma) represents the sigmoid function.

The embodiment newly introduces the weights W, E and the biases b and c representing the vector sequence E after the Embedding and the Embedding layers and the LSTM layers. V, d are the weights and biases of the output layer.

According to the embodiment, by introducing BERT to convert the Embedding, semantic information of the problem statement can be captured better. BERT is a pre-trained natural language processing model that, through contextual learning, can generate richer semantic representations. The introduction of the LSTM layer enables the model to better handle the sequence information of the problem statement. LSTM helps to capture long-term dependencies in statements by memorizing the cell states, improving understanding of the problem context.

Step S2104, model configuration.

An optimizer (optimizer) of the configuration model. This embodiment selects the Adam optimizer and loss function (binary_cross sentropy). The Adam optimizer helps to increase the convergence speed of the model during training, while the binary-cross sentropy loss function works well for the classification problem.

In configuring the Adam optimizer, the Adam optimizer may be updated based on estimates of first and second moments, momentum terms, learning rate decay parameters, parameters controlling the change in learning rate over time, number of iterations, bias corrections of the first and second moments. For example, the update rules of Adam optimizer may be as follows:

wherein m is _t And v _t The first and second moments at time t are estimated, respectively. β1 and β2 are the kinetic terms, typically taken as 0.9 and 0.999, respectively. θt is a parameter set of the model, including weights and biases in the neural network, and the like. a is learning rate attenuation parameter, delta theta _t-1 ＝θ _t -θ _t-1 Indicating a change in the parameter. ρ is a parameter of the newly introduced control over time learning rate. t is the current iteration number.And->Deviation correction for the first and second moments, respectively. η is the learning rate. Epsilon is a small constant added for numerical stability.

In this embodiment, the update rule of the Adam optimizer combines the concepts of momentum (momentum) and adaptive learning rate, which is beneficial to improving the convergence speed and stability of the model in the training process. By calculating first and second moment estimates of the gradient, adam can dynamically adjust the learning rate, with different learning rates for different parameters, thus more flexibly accommodating gradient changes in different directions and magnitudes. This helps to avoid the problem of too large or too small a learning rate setting, improving the effect of optimization. In addition, adam also utilizes the concept of momentum to enable the updating direction to be smoother, thereby being beneficial to overcoming the oscillation phenomenon in the optimization process and further improving the convergence speed and generalization capability of the model.

The Adam optimizer dynamically adjusts the learning rate based on the first and second moment estimates of the gradient, helping the model to converge faster. The binary_cross sentropy loss function is adopted, and the binary-type problem is applicable. This loss function is used to measure the difference between the model output and the actual label, and minimizing this difference is the goal of the training process.

In some embodiments, the penalty function may be determined based on different types of regularization terms, regularization term weight parameters, actual tags, and predictive tags. For example, the binary cross entropy loss function may be determined in the following manner:

wherein R is ₁ (θ) and R ₂ (θ) are two different types of regularization terms, respectively, and β is a regularization term weight parameter that controls the weights of the two regularization terms in the total loss. N is the number of samples. y is _i Is the actual tag.Is the predictive label of the model output.

The introduction of the loss function of the regularization term is beneficial to controlling the complexity of the model, preventing overfitting and improving the generalization capability of the model on unseen data. Regularization terms make the model more prone to learn simple and more general patterns by penalizing the size of the model parameters, rather than overfitting noise in the training data. This helps to prevent the model from over fitting when faced with new data, thereby improving the robustness and practicality of the model. Through proper regularization, the fitting degree of the model on training data and the generalization performance of the model on unknown data can be effectively balanced, so that the model has universality and reliability.

The embodiment also introduces items such as learning rate and regularization, so that the configuration of the model is more flexible. The learning rate and regularization terms of different layers can help the optimizer to better adjust model parameters, and stability and convergence speed of model training are improved. In addition, the regularization term is introduced to effectively control the complexity of the model, prevent overfitting and improve the generalization capability of the model on unseen data. Finally, through the training and performance verification of a plurality of epochs, the model is gradually adapted to training data, and the finally obtained F1 score shows that the model is excellent in multi-purpose classification task and has higher accuracy and performance.

Step S2106, model training.

Training of a plurality of epochs is performed on the training data using the configured optimizer and loss function. The process continuously updates the parameters of the model through back propagation, so that the model is gradually adapted to training data, and the recognition capability of intention is improved.

Step S2108, performance verification.

After training is completed, performance verification is performed using a verification set or test set. And (3) taking the F1 score as an evaluation index, running the model in batches, and calculating the F1 score to evaluate the accuracy and generalization capability of the model.

The final loss at 0.141 indicates that the model gradually converges during training. And the F1 score obtained by batch operation verification is 0.913, which shows that the model is excellent in multi-purpose classification task and has high accuracy and generalization capability.

Example 3

The existing question-answer model is an offline model which can not answer such as: today, the day of the week, the day of the year, the weather today, what movies recently have real-time problems.

The embodiment of the application provides a method for replying real-time problems. The method provides the problem diversion capability, the traditional knowledge question-answering model answers the traditional problems, and the real-time problems are handed to other APIs for real-time operation. As shown in fig. 4, the method comprises the steps of:

step S402, collecting single intention data.

The data sources comprise on-line QA question-answer pairs, the focus is collected towards real-time questions, and the on-line QA question-answer pairs are marked with intention.

Specifically, data of a single intent is collected. These data originate from an on-line question-answer pair (QA question-answer pair) and are biased towards real-time questions by re-collection. Real-time issues may include issues like "today's day of the week", "today's day", "today's weather", "what movies recently exist", etc. that need to be replied to in time. Then, data labeling is performed. The collected QA question-answer pairs need to go through the process of intent labeling. Each question is labeled with a corresponding intent, which facilitates subsequent model training and classification.

Step S404, acquiring question and answer information, and performing word segmentation on the question sentence data.

Question sentence data is extracted from the collected pair of QA questions and answers. These question-answer pairs may cover various topics, but in this step, real-time type questions are of particular interest to meet the return needs for real-time questions.

And performing word segmentation on the acquired problem sentences. Dividing each question into a word group makes the expression of the question easier to handle by the model. The word segmentation process can adopt a common natural language processing tool or library to ensure that the segmented words can retain the semantic information of the original problem.

The purpose of word segmentation is to provide preparation for subsequent processing (e.g., embedding transformations, model training). The segmented problem data is easier to embed into the model, so that the understanding of the model on the problem context is improved, and semantic information is captured.

In word segmentation, care needs to be taken to deal with special situations such as ambiguities, stop words, punctuation marks, etc. The meaning of the problem can be accurately reflected by the word groups after segmentation, and the influence of noise is reduced.

In this step, the real-time problem is emphasized, so that it is required to ensure that the data after word segmentation can be effectively used for the subsequent real-time operation, and the recovery efficiency of such problems is improved.

Through step S404, the problem sentence data is segmented, so that a clear and operable data form is provided for the subsequent processing steps, and preparation is made for training and evaluation of the model.

Step S406, adding a preset mark in the segmented word group.

And introducing a predefined token (token) aiming at the problem statement data which are segmented. These token may have special meanings, such as identifying the beginning and end of a sentence, or to highlight certain specific parts of a question.

The purpose of adding a token is to provide more information about the structure and context of the problem for the model. This helps to improve the understanding of the problem by the model, enabling it to better capture important features in the problem. For example, a "< start >" token may be added at the beginning of the question, indicating the beginning of the sentence, and a "< end >" token may be added at the end, indicating the end of the sentence. The specific choice of these token may be tailored to the specific problem and model design.

In the process of adding token, attention needs to be paid not to destroy the semantic information of the original problem statement. Ensuring that the introduction of a token does not change the meaning of the problem, but rather for better model understanding and handling.

Since the present embodiment focuses on handling real-time problems, when adding a token, it can be considered whether there is a special token for identifying real-time problems, so that subsequent processing can perform more flexible operations for such problems.

Through step S406, a preset token is introduced into the word group of the question sentence, so as to provide more abundant context information for the model, and facilitate improving the processing effect of the model on real-time questions.

In step S408, a label is specifically defined.

Specific definitions are made for label, such as: single intent.

Specifically, for the collected question-answer data, particularly real-time class questions, a respective intent label is defined for each question. These labels may be singular, representing a single question intent, or multiple, representing multiple possible intentions.

Depending on the actual requirements, a binary representation may be used, in which each tag is represented by a binary bit, or other coding may be used. For example, a single intent may correspond to [1,0, 0.], while multiple intents may correspond to [1, 0.]. For example, for a question of querying weather, a single intent tag may be defined as [1,0], where a first location represents an intent of querying weather or not, and a second location represents other intent. For the question of querying weather and date, it is possible to define a multi-intent tag as [1,1], where both locations indicate that there is a corresponding intent. If a plurality of categories exist, reasonable label definition needs to be carried out according to actual conditions, so that each problem can be ensured to obtain a correct label. The definition of the labels needs to be consistent with the settings of the model output layer to ensure matching of the labels during training and prediction.

Through step S408, label is specially defined, and accurate problem intention labels are provided for training the model, so that the model can learn and predict intention categories corresponding to different problems. This helps to improve the performance of the model on multi-purpose classification tasks.

And step S410, outputting the model.

Defined as single tag output and softmax ([ y1, y2, y3, y4...ym ]), the intent classification was judged. In this embodiment, the method for constructing and training the model is the same as that in embodiment 3, and will not be described here.

In the embodiment, mainly for improving the question-answering capability of the whole system, the emphasis is on the annotation of real-time questions in the annotation data, and a better effect is presented in the intention diversion before question-answering. Compared with the prior art, the method and the device for solving the actual problems of the user can improve the question and answer range of the whole system, limit the question and answer to traditional knowledge points, and help the user to solve the actual problems in the question and answer process.

After the text is mapped into the vector, cos calculation is performed according to the defined intention class library to achieve the same effect, and accuracy and generalization of the result cannot be guaranteed on the result.

The feasibility of the model was measured using the spark_category_cross-sentropy penalty, with a final penalty of 0.085. Batch runs were also performed on the model at the end to verify the final model's result score by f1_score, which is at 0.942.

In the embodiment, a TensorFlow framework is adopted, BERT is used for realizing conversion of Embedding in an experiment, LSTM is used for training and evaluating results in the experiment, and an optimizer is used as Adam in the experiment.

Example 4

The embodiment of the application provides a multi-intention recognition supporting large model QA question-answering device, as shown in fig. 5, comprising: an identification module 52, a determination module 54, and a generation module 56.

The recognition module 52 is configured to, in response to receiving a question entered by a user, multi-intent recognition of the question and to add an intent of the multi-intent recognition to the question; the determination module 54 is configured to narrow down the answer range to the question for the intent identified by the multiple intent, and determine a final question intent based on the narrowed down answer range; the generation module 56 is configured to generate a question response for the question based on the final question intent.

It should be noted that: the multi-intention recognition supporting large model QA question-answering device provided in the above embodiment is only exemplified by the division of the above functional modules, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the multi-intention recognition supporting large model QA question-answering device provided in the above embodiment and the multi-intention recognition supporting large model QA question-answering method embodiment belong to the same concept, and detailed implementation processes thereof are detailed in the method embodiment and are not described herein.

Example 5

The embodiment of the application provides a real-time problem recovery device, as shown in fig. 6, including: a problem classification module 62 and a generation module 64.

The question classification module 62 is configured to identify intent of a question in response to receiving user input and classify the question based on the intent identified by the intent; the generation module 64 is configured to generate a question response for the question using different question models based on the results of the classification.

It should be noted that: the real-time problem recovery device provided in the above embodiment is only exemplified by the division of the above functional modules, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the device for replying the real-time problem and the method for replying the real-time problem provided in the above embodiments belong to the same concept, and detailed implementation processes of the device are shown in the method embodiments, which are not described herein.

Example 6

Fig. 7 shows a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. It should be noted that the electronic device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

As shown in fig. 7, the electronic device includes a Central Processing Unit (CPU) 1001 that can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for system operation are also stored. The CPU1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. When executed by a Central Processing Unit (CPU) 1001, performs the various functions defined in the methods and apparatus of the present application. In some embodiments, the electronic device may further include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device.

The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below. For example, the electronic device may implement the steps of the method embodiments described above, and so on.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed terminal device may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. A multi-purpose recognition supporting large model QA question-answering method, comprising:

in response to receiving a question entered by a user, multi-intent recognition is performed on the question, and intent recognized by the multi-intent is added to the question;

narrowing the answer range for the question with respect to the intent identified by the multi-intent, and determining a final question intent based on the narrowed answer range;

based on the final question intent, a question response for the question is generated.

2. The method of claim 1, wherein multi-intent recognition of the problem comprises: multi-purpose recognition of the problem is performed using a pre-trained model, wherein the model is derived by:

obtaining sample data, the sample data comprising multi-intent data and single-intent data;

word segmentation is carried out on sentence data corresponding to the sample data, and segmented word groups are obtained;

adding a preset label in the segmented word group, wherein the label comprises single intention and multiple intention;

training the model based on the labels, wherein the model classifies the labels to determine final problem intent.

3. The method of claim 2, wherein training the model comprises:

determining a loss function of the model using binary cross entropy and training the model using the loss function;

and running the models in batches, and verifying the final performance result of the models through the F1 score.

4. The method of claim 2, wherein the model categorizes the tags to determine a final problem intent, comprising:

converting the text of the segmented word group into vector representation;

comparing the vector representation with intent categories in a predefined library of intent categories using cosine similarity;

based on the result of the comparison, the labels of the segmented word groups are classified to determine a final problem intent.

5. The method of claim 2, wherein the sample data includes an online QA question-answer pair and an intent label for the online QA question-answer pair.

6. The method of claim 2, wherein after obtaining the segmented word groups, the method further comprises: and adding a preset mark in the segmented word group, wherein the mark is used for marking the segmentation mode.

7. The method of claim 2, wherein the single intent is abstracted defined to determine the multiple intents, wherein the number of multiple intents is dynamically adjustable and the number of multiple intents is adjustable by modifying the tag.

8. A multi-purpose recognition supporting large model QA question-answering apparatus, comprising:

an identification module configured to identify multiple intents for a question in response to receiving a user input, and to add the intent identified by the multiple intents to the question;

a determining module configured to narrow down an answer range for the question for the intent identified by the multiple intent, and determine a final question intent based on the narrowed-down answer range;

a generation module configured to generate a question response for the question based on the final question intent.

9. An electronic device, comprising:

a memory configured to store a computer program;

a processor configured to cause a computer to perform the method of any one of claims 1 to 7 when the program is run.

10. A computer-readable storage medium, on which a program is stored, characterized in that the program, when run, causes a computer to perform the method of any one of claims 1 to 7.