CN117235239A - Active dialogue large model construction device, method, equipment and storage medium - Google Patents

Active dialogue large model construction device, method, equipment and storage medium Download PDF

Info

Publication number
CN117235239A
CN117235239A CN202311499786.1A CN202311499786A CN117235239A CN 117235239 A CN117235239 A CN 117235239A CN 202311499786 A CN202311499786 A CN 202311499786A CN 117235239 A CN117235239 A CN 117235239A
Authority
CN
China
Prior art keywords
model
preset
training
question
questioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311499786.1A
Other languages
Chinese (zh)
Other versions
CN117235239B (en
Inventor
刘伟华
马金民
李林
魏欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Athena Eyes Co Ltd
Original Assignee
Athena Eyes Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Athena Eyes Co Ltd filed Critical Athena Eyes Co Ltd
Priority to CN202311499786.1A priority Critical patent/CN117235239B/en
Publication of CN117235239A publication Critical patent/CN117235239A/en
Application granted granted Critical
Publication of CN117235239B publication Critical patent/CN117235239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The application discloses an active dialogue large model construction device, method, equipment and storage medium, relating to the field of model construction, comprising the following steps: the model determining module is used for determining a preset questioning model and a preset diagnosis model based on the generated pre-training transducer model; the training set construction module is used for inputting the real medical record information of the user into a preset questioning model to generate a false sample, and constructing a first training set based on the false sample and the true sample; the controller training module is used for inputting the first training set into the initial controller so as to obtain a target controller by utilizing an countermeasure training method and carrying out gradient update on parameters of the initial controller based on a pre-constructed target function; the large model construction module is used for constructing an active dialogue large model based on a preset questioning model, a preset diagnosis model and a target controller so as to conduct questioning dialogue. According to the application, the active dialogue large model is constructed, so that the users can be actively asked to acquire more user information, and the accuracy of the inquiry result is improved.

Description

Active dialogue large model construction device, method, equipment and storage medium
Technical Field
The present application relates to the field of model construction, and in particular, to an active session large model construction apparatus, method, device, and storage medium.
Background
Currently, the ChatGPT (Chat Generative Pre-trained Transformer, chat generation pre-training converter), chatGLM (Chat General Language Model), LLaMA (Large Language Model Meta AI) large model exhibits striking dialogue capability, whether single or multiple rounds of which answer capability has exceeded the level of learning of a standing person. For example: chatgpt4.0 basically meets the daily life requirements of people in knowledge searching and answering.
However, the large model does not have human thinking ability, or he cannot communicate with a conversation as a real person, and generally has the following drawbacks: incorrect answers, inactivity, deviations in answers, lengthy answers, etc., especially for inactive questions. In the professional field, especially in the aspect of medical inquiry, the problem that the conversation robot based on the large model cannot sense the illness state information of the user, and the user needs to input the illness state information into the model in one time, but the problem is very difficult for the user, the common patient cannot accurately provide all effective information to the model, and at the moment, the best scene is that a professional doctor can continuously communicate and inquire with the user to acquire accurate information required by disease diagnosis. This is a disadvantage of the current large model, i.e. the user can only ask questions continuously, and the machine can only proceed in the form of summarized answers.
Disclosure of Invention
Accordingly, the present application is directed to an active dialogue large model construction device, method, apparatus and storage medium, which can construct an active dialogue large model to actively ask a user to obtain more user information, thereby improving accuracy of the query result. The specific scheme is as follows:
in a first aspect, the present application provides an active session large model building apparatus, including:
the model determining module is used for determining a preset questioning model and a preset diagnosis model based on the generated pre-training transducer model;
the training set construction module is used for inputting real medical record information of a user into the preset questioning model to generate corresponding false samples, and constructing a first training set based on the false samples and the true samples; the true samples are samples determined based on a second training set for training the preset diagnostic model;
the controller training module is used for inputting the first training set into an initial controller so as to perform gradient update on parameters of the initial controller by using an countermeasure training method and based on a pre-constructed objective function, so as to obtain a target controller;
and the large model construction module is used for constructing an active dialogue large model based on the preset questioning model, the preset diagnosis model and the target controller so as to utilize the active dialogue large model to carry out questioning dialogue.
Optionally, the model determining module includes:
the training set determining unit is used for collecting historical question-answer contents of a plurality of users and carrying out inversion processing on the historical question-answer contents to obtain a third training set;
and the questioning model training unit is used for training the first generated pre-training transducer model by utilizing the third training set so as to obtain a preset questioning model.
Optionally, the model determining module includes:
the diagnosis model training unit is used for determining the historical question and answer contents and the corresponding historical question and answer results of the plurality of users as a second training set, and training a second generation type pre-training transducer model by utilizing the second training set to obtain a preset diagnosis model.
Optionally, the training set construction module includes:
and the training set construction unit is used for constructing a first training set based on the false sample, the first true sample and the second true sample.
Optionally, the active dialogue large model construction device further includes:
the first true sample determining unit is used for deleting the historical inquiry results contained in each training sample in the second training set so as to obtain the first true sample;
a second true sample determining unit, configured to determine a historical question-answer content and a corresponding question-answer round included in any training sample in the second training set, and determine the second true sample based on a target historical question-answer content corresponding to a previous N rounds in the question-answer round; the front N rounds are from round 1 to round N, wherein N is smaller than the question-answer round.
Optionally, the active dialogue large model construction device further includes:
a controller construction unit for constructing the initial controller based on the self-encoder and the classifier;
and the objective function construction unit is used for constructing the objective function based on the first loss function corresponding to the initial controller, the classification loss function corresponding to the classifier and the second loss function corresponding to the preset question model.
Optionally, the large model building module includes:
the question and answer content acquisition unit is used for acquiring all round question and answer contents completed by the current target user so as to acquire the current historical question and answer content of the target user;
a diagnosable rate determining unit for inputting the current historical question-answer content to the active dialogue large model to determine a diagnosable rate corresponding to the current historical question-answer content through the target controller;
a question data generating unit, configured to input the current historical question content to the preset question model to generate current question data for the target user if the diagnosable rate is not greater than a preset threshold, and re-jump to the step of acquiring all round question contents completed by the current target user when the answer content of the target user for the current question data is acquired;
and the inquiry result generation unit is used for inputting the current historical inquiry response content into the preset diagnosis model to generate a corresponding inquiry result if the diagnosable rate is larger than the preset threshold value.
In a second aspect, the present application provides a method for constructing an active session large model, including:
determining a preset questioning model and a preset diagnosis model based on the generated pre-training transducer model;
inputting the real medical record information of the user into the preset questioning model to generate a corresponding false sample, and constructing a first training set based on the false sample and the true sample; the true samples are samples determined based on a second training set for training the preset diagnostic model;
inputting the first training set into an initial controller to perform gradient update on parameters of the initial controller by using an countermeasure training method and based on a pre-constructed objective function so as to obtain a target controller;
and constructing an active dialogue large model based on the preset questioning model, the preset diagnosis model and the target controller so as to conduct questioning dialogue by using the active dialogue large model.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the active dialogue large model construction method.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the active session large model building method described above.
In the application, a model determining module is used for determining a preset questioning model and a preset diagnosis model based on a generated pre-training transducer model; the training set construction module is used for inputting real medical record information of a user into the preset questioning model to generate corresponding false samples, and constructing a first training set based on the false samples and the true samples; the true samples are samples determined based on a second training set for training the preset diagnostic model; the controller training module is used for inputting the first training set into an initial controller so as to perform gradient update on parameters of the initial controller by using an countermeasure training method and based on a pre-constructed objective function, so as to obtain a target controller; and the large model construction module is used for constructing an active dialogue large model based on the preset questioning model, the preset diagnosis model and the target controller so as to utilize the active dialogue large model to carry out questioning dialogue. Therefore, on one hand, the method determines the preset questioning model and the preset diagnosis model based on the generated pre-training transducer model, so that the user is actively questioned by using the preset questioning model to generate corresponding questioning data, and the preset diagnosis model is used to generate corresponding questioning results; on the other hand, the application constructs the first training set through the false sample generated based on the preset questioning model and the true sample determined based on the second training set for training the preset diagnosis model so as to conduct countermeasure training on the controller, improve the recognition capability of the controller and enable the questioning data generated by the preset questioning model to be closer to the real data; in still another aspect, the application constructs the active dialogue large model based on the preset questioning model, the preset diagnosis model and the target controller, so as to actively ask a user through the active dialogue large model, thereby acquiring more user information and enabling the questioning result generated by the preset diagnosis model to be more accurate and reliable.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an active dialogue large model construction device according to the present application;
FIG. 2 is a schematic diagram of a controller according to the present disclosure;
FIG. 3 is a flow chart of a consultation dialogue disclosed in the present application;
FIG. 4 is a flow chart of an active dialogue large model construction method disclosed by the application;
fig. 5 is a block diagram of an electronic device according to the present disclosure.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The existing large model does not have human thinking ability, or can not communicate with a real person, and has the following defects: incorrect answers, inactivity, deviations in answers, lengthy answers, etc., especially for inactive questions. Therefore, the application provides an active dialogue large model construction device, which is used for actively asking a user to acquire more user information by constructing an active dialogue large model, so that the accuracy of a consultation result is improved.
Referring to fig. 1, an embodiment of the present application discloses an active dialogue large model construction device, which includes:
the model determining module 11 is configured to determine a preset questioning model and a preset diagnosis model based on the generated pre-training transducer model.
In this embodiment, the model determining module may include: the training set determining unit is used for collecting historical question-answer contents of a plurality of users and carrying out inversion processing on the historical question-answer contents to obtain a third training set; and the questioning model training unit is used for training the first generated pre-training transducer model by utilizing the third training set so as to obtain a preset questioning model. It can be understood that the main function of the preset questioning model is to generate corresponding questioning data according to the questioning and answering contents of the user, so that for training of the preset questioning model, it is necessary to obtain normal historical questioning and answering contents of a plurality of users, and take the historical answering contents of a plurality of users as characteristics and the corresponding historical questioning data as labels, so as to construct a third training set, and then train a first generation Pre-training transducer model (GPT, generating Pre-Trained Transformer) by using the third training set, so as to obtain the preset questioning model.
In this embodiment, the model determining module may include: the diagnosis model training unit is used for determining the historical question and answer contents and the corresponding historical question and answer results of the plurality of users as a second training set, and training a second generation type pre-training transducer model by utilizing the second training set to obtain a preset diagnosis model. It can be understood that the main function of the preset diagnostic model is to generate a corresponding inquiry result according to the inquiry content of the user, so that for training of the preset diagnostic model, a plurality of normal history inquiry contents of the user can be used as features, corresponding history inquiry results are used as labels, thereby constructing a second training set, and then the second training set is used for training the second generated pre-training transducer model to obtain the preset diagnostic model.
The training set construction module 12 is configured to input real medical record information of a user into the preset questioning model to generate a corresponding false sample, and construct a first training set based on the false sample and the true sample; the true samples are samples determined based on a second training set for training the preset diagnostic model.
In this embodiment, for the generation of the false sample, the real medical record information of the user is input to the Gpt-1 preset questioning model to generate the corresponding questioning data, and since the questioning data at this time is generated, it cannot be determined whether the questioning result can be obtained according to the generated questioning data, and therefore the generated questioning data can be defined as the false sample. Thus, the discrimination capability of the controller can be improved by training the controller with the dummy samples.
In this example, constructing the first training set based on the dummy samples and the true samples may include: a first training set is constructed based on the dummy samples, the first true samples, and the second true samples. It will be appreciated that for a first training set to be used to train a controller, it is desirable to include false samples generated based on a preset questioning model, and first and second true samples determined based on a second training set to be used to train a preset diagnostic model, such that the first training set contains both true and false samples, and such that the controller is trained on the preset questioning model and the preset diagnostic model.
In this embodiment, the active session large model building device further includes: the first true sample determining unit is used for deleting the historical inquiry results contained in each training sample in the second training set so as to obtain the first true sample; a second true sample determining unit, configured to determine a historical question-answer content and a corresponding question-answer round included in any training sample in the second training set, and determine the second true sample based on a target historical question-answer content corresponding to a previous N rounds in the question-answer round; the front N rounds are from round 1 to round N, wherein N is smaller than the question-answer round. It will be appreciated that for the first true sample, the first true sample may be obtained by removing the historical questioning results contained in each training sample in the second training set for training the preset diagnostic model. For example, a certain training sample in the second training set includes four rounds of historical questioning and answering contents and corresponding historical questioning results, the historical questioning results included in the training sample are deleted, and the remaining four rounds of historical questioning and answering contents are the first true sample. It should be noted that if the first true sample is input to the preset diagnostic model, a corresponding inquiry result may be generated. For the second true sample, firstly, determining the historical question-answer content and the corresponding question-answer turn contained in any one training sample in the second training set for training the preset diagnosis model, and then determining the second true sample based on the target historical question-answer content corresponding to the previous N rounds in the question-answer turn; wherein, the front N rounds are from 1 st round to N th round, and N is smaller than the question-answer round. It should be noted that, if the second true sample is input to the preset diagnosis model, the corresponding inquiry result cannot be generated due to the incomplete contents of the inquiry. For example, a certain training sample in the second training set contains four rounds of historical question-answer content, so that N is less than 4, that is, the target historical question-answer content corresponding to the previous round, the previous two rounds and the previous three rounds respectively can be determined as the second true sample.
The controller training module 13 is configured to input the first training set into an initial controller, so as to perform gradient update on parameters of the initial controller by using an countermeasure training method and based on a pre-constructed objective function, so as to obtain a target controller.
In this embodiment, under normal conditions, the preset questioning model can only generate corresponding questioning data according to the questioning and answering contents of the user, and the preset diagnosis model can only generate corresponding questioning and answering results according to the questioning and answering contents of the user, and if the preset questioning model and the preset diagnosis model are simply combined together, endless circulation is easily caused. Therefore, the application designs the controller to effectively combine the two models, and can judge whether the preset questioning model continuously generates questioning data or the preset diagnosis model generates questioning results according to the diagnosable rate calculated by the controller. For the training of the controller, the countermeasure training idea is adopted, and the training process of the controller (equivalent to the discriminator) needs to be performed by using a first training set constructed based on the first true sample, the second true sample and the false sample generated by the preset questioning model (equivalent to the generator). Specifically, the first training set is utilized to conduct countermeasure training on the initial controller, gradient updating is conducted on parameters of the initial controller through a pre-built objective function, and training on the controller is finished when training rounds reach preset rounds or controller accuracy reaches a preset accuracy threshold, so that a trained target controller is obtained. The corresponding diagnosable rate can be calculated by the target controller according to the input current historical question-answer content of the target user. In this way, by training the initial controller by using the first training set including the true sample and the false sample, the question data generated by the preset question model for generating the false sample can be made closer to the real data, and the recognition capability of the controller can be improved.
In this embodiment, the active session large model building device further includes: a controller construction unit for constructing the initial controller based on the self-encoder and the classifier; and the objective function construction unit is used for constructing the objective function based on the first loss function corresponding to the initial controller, the classification loss function corresponding to the classifier and the second loss function corresponding to the preset question model. It will be appreciated that as shown in fig. 2, the initial controller is constructed based on the self-encoder and the classifier cls, and accordingly, the objective function for the controller is constructed based on the first loss function corresponding to the initial controller, the classification loss function corresponding to the classifier, and the second loss function corresponding to the preset questioning model. It should be noted that, when the gradient update is performed on the parameters of the initial controller based on the objective function of the controller, the gradient update is performed on the parameters of the preset questioning model based on the objective function of the preset questioning model, so as to fine tune the preset questioning model, and obtain a questioning model with better effect. The formula involved is as follows:
note that Loss d Representing an objective function of the controller; loss (Low Density) g Representing an objective function of a preset questioning model;representing the gradient of the controller; />Representing the gradient of a preset questioning model; m represents the number of training samples; i represents the ith training sample; x is x i Representing the i-th true sample; />Representing a dummy sample generated based on the ith real sample; />A first loss function representing an initial controller; />Representing a classification loss function of the classifier; />A second loss function representing a preset questioning model.
The big model construction module 14 is configured to construct an active dialogue big model based on the preset questioning model, the preset diagnosis model and the target controller, so as to perform a questioning dialogue by using the active dialogue big model.
In this embodiment, an active dialogue large model is constructed according to a preset questioning model, a preset diagnosis model and a target controller, so that a user is actively questioned through the preset questioning model in the active dialogue large model to obtain more user information, the target controller in the active dialogue large model comprehensively calculates the diagnosable rate based on the obtained user information, and then the diagnosable rate is compared with a preset threshold value to judge whether the questioning data is continuously generated by the preset questioning model or the questioning result is generated by the preset diagnosis model.
Therefore, on one hand, the method determines the preset questioning model and the preset diagnosis model based on the generated pre-training transducer model, so that the user is actively questioned by using the preset questioning model to generate corresponding questioning data, and the preset diagnosis model is used to generate corresponding questioning results; on the other hand, the application constructs the first training set through the false sample generated based on the preset questioning model and the true sample determined based on the second training set for training the preset diagnosis model so as to conduct countermeasure training on the controller, improve the recognition capability of the controller and enable the questioning data generated by the preset questioning model to be closer to the real data; in still another aspect, the application constructs the active dialogue large model based on the preset questioning model, the preset diagnosis model and the target controller, so as to actively ask a user through the active dialogue large model, thereby acquiring more user information and enabling the questioning result generated by the preset diagnosis model to be more accurate and reliable.
On the basis of any of the above embodiments, the large model building module 14 includes:
and the question and answer content acquisition unit is used for acquiring all round question and answer contents completed by the current target user so as to acquire the current historical question and answer content of the target user.
And a diagnosable rate determining unit for inputting the current historical question-answer content to the active dialogue large model to determine a diagnosable rate corresponding to the current historical question-answer content through the target controller.
In this embodiment, considering that a user cannot input complete and effective medical record information at one time under normal conditions, it is required to obtain more useful information through continuous questioning, so as to help a model obtain more accurate diagnosis results, that is, the dialogue between the model and the user is often multiple, and after each round of dialogue is finished, it is required to comprehensively judge whether the corresponding questioning result can be obtained according to all round questioning contents completed by the current target user. Specifically, all round questioning and answering contents completed by the current target user are obtained and determined as current historical questioning and answering contents of the target user, then the current historical questioning and answering contents are input to a target controller in the active dialogue large model, so that corresponding diagnosable rates are calculated through the target controller based on the current historical questioning and answering contents, and the diagnosable rates represent whether the questioning and answering results can be obtained according to the current historical questioning and answering contents of the target user.
And the question data generating unit is used for inputting the current historical question and answer content into the preset question model to generate current question data aiming at the target user if the diagnosable rate is not greater than a preset threshold value, and re-jumping to the step of acquiring all round of question and answer content completed by the current target user when the answer content aiming at the current question data of the target user is acquired.
In this embodiment, if the diagnosable rate is less than or equal to the preset threshold, it indicates that the inquiry result cannot be determined according to the current historical inquiry content of the target user, and more user information needs to be further acquired. At this time, the current historical question-answering content of the target user is input into a preset question model in the active dialogue large model to generate relevant current question data based on the current historical question-answering content, and when the target user replies to the current question data, that is, the answer content of the target user aiming at the current question data is obtained, all round question-answering contents completed by the current target user are re-obtained to obtain new current historical question-answering content, and the current historical question-answering content is re-evaluated by utilizing a target controller in the active dialogue large model to obtain new diagnosable rate.
And the inquiry result generation unit is used for inputting the current historical inquiry response content into the preset diagnosis model to generate a corresponding inquiry result if the diagnosable rate is larger than the preset threshold value.
In this embodiment, if the diagnosable rate is greater than the preset threshold, it indicates that a diagnosis result can be obtained according to the current historical question-answer content of the target user, and at this time, the current historical question-answer content is input to a preset diagnosis model in the active dialogue large model to generate a corresponding question result based on the current historical question-answer content, so that accuracy and reliability of the question result can be improved.
In this embodiment, as shown in fig. 3, the question-answer content of the first round T-1 completed by the current target user is first obtained, and the question-answer content of the first round T-1 is input to the target controller in the active dialogue large model to obtain the diagnosable rate d_rate of the first round T-1, and then it is determined whether the diagnosable rate of the first round T-1 is greater than the preset threshold value 0.5. If the diagnosable rate of the first round T-1 is less than or equal to 0.5, i.e., 1-d_rate is greater than or equal to 0.5, the question-answer content of the first round T-1 is input to a preset question model Gpt-1 in the active dialogue large model to generate current question data for the target user based on the question-answer content of the first round T-1, and when the answer content of the target user for the current question data is acquired, the question-answer content of the front two round T-2 completed by the current target user is re-acquired, the question-answer content of the front two round T-2 is input to the target controller, and then the question-answer content of the front two round T-2 is evaluated by the target controller to obtain the diagnosable rate d_rate of the front two round T-2. If the diagnosable rate of the previous two-round T-2 is less than or equal to 0.5, the question-answer content of the previous two-round T-2 is input to the preset question model Gpt-1 to generate current question data for the target user based on the question-answer content of the previous two-round T-2, and when the answer content of the target user for the current question data is obtained, the question-answer content of the previous three-round T-3 completed by the current target user is re-obtained, the question-answer content of the previous three-round T-3 is input to the target controller, and then the question-answer content of the previous three-round T-3 is evaluated through the target controller to obtain the diagnosable rate D_rate of the previous three-round T-3. If the diagnosable rate of the previous three rounds T-3 is less than or equal to 0.5, inputting the question-answer content of the previous three rounds T-3 to a preset question model Gpt-1 to generate current question data for the target user, and when the answer content of the target user for the current question data is acquired, re-acquiring the question-answer content of the previous four rounds T-4 completed by the current target user, inputting the question-answer content of the previous four rounds T-4 to the target controller, and then evaluating the question-answer content of the previous four rounds T-4 through the target controller to obtain the diagnosable rate d_rate of the previous four rounds T-4. If the diagnosability of the front four-wheel T-4 is greater than 0.5, i.e., the 1-d_rate is less than 0.5 at this time, it is indicated that the questioning result may be obtained according to the questioning content of the front four-wheel T-4, i.e., the questioning content of the front four-wheel T-4 is input to the preset diagnosis model Gpt-2 in the active dialogue large model to generate the questioning result T-final corresponding to the questioning content of the front four-wheel T-4.
Therefore, the application obtains the current historical question-answer content aiming at all round question-answer contents provided by the current target user, calculates the diagnosable rate based on the current historical question-answer content through the target controller in the active dialogue large model, judges whether the question-asking result can be obtained according to the diagnosable rate, namely, if the diagnosable rate is smaller than or equal to the preset threshold value at the moment, the current question-asking data aiming at the target user is actively generated through the preset question-asking model in the active dialogue large model so as to obtain more user information, so that the preset question-asking model has the capability of actively asking questions; and when the diagnosable rate is larger than a preset threshold value, a query result corresponding to the current historical query content is generated through a preset diagnosis model in the active dialogue large model, so that the accuracy and reliability of the query result are improved.
The above embodiment introduces the functions of each module in the application in detail, and can acquire more user information by constructing an active dialogue large model to actively ask a question to the user, thereby improving the accuracy of the result of the inquiry; the following examples describe a method for active session large model construction. Referring to fig. 4, the embodiment of the application discloses a method for constructing an active dialogue large model, which comprises the following steps:
and S11, determining a preset questioning model and a preset diagnosis model based on the generated pre-training transducer model.
Step S12, inputting the real medical record information of the user into the preset questioning model to generate a corresponding false sample, and constructing a first training set based on the false sample and the true sample; the true samples are samples determined based on a second training set for training the preset diagnostic model.
Step S13, inputting the first training set into an initial controller, so as to update the parameters of the initial controller in a gradient way by using an countermeasure training method and based on a pre-constructed objective function, and obtain the objective controller.
And S14, constructing an active dialogue large model based on the preset questioning model, the preset diagnosis model and the target controller so as to conduct questioning dialogue by using the active dialogue large model.
Therefore, on one hand, the method determines the preset questioning model and the preset diagnosis model based on the generated pre-training transducer model, so that the user is actively questioned by using the preset questioning model to generate corresponding questioning data, and the preset diagnosis model is used to generate corresponding questioning results; on the other hand, the application constructs the first training set through the false sample generated based on the preset questioning model and the true sample determined based on the second training set for training the preset diagnosis model so as to conduct countermeasure training on the controller, improve the recognition capability of the controller and enable the questioning data generated by the preset questioning model to be closer to the real data; in still another aspect, the application constructs the active dialogue large model based on the preset questioning model, the preset diagnosis model and the target controller, so as to actively ask a user through the active dialogue large model, thereby acquiring more user information and enabling the questioning result generated by the preset diagnosis model to be more accurate and reliable.
Further, the embodiment of the present application further discloses an electronic device, and fig. 5 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.
Fig. 5 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement relevant steps in the active dialogue large model construction method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol in which the communication interface is in compliance is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the active conversational large model building method performed by the electronic device 20 as disclosed in any of the previous embodiments.
Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the active dialogue large model construction method disclosed previously. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. An active dialogue large model construction device, characterized by comprising:
the model determining module is used for determining a preset questioning model and a preset diagnosis model based on the generated pre-training transducer model;
the training set construction module is used for inputting real medical record information of a user into the preset questioning model to generate corresponding false samples, and constructing a first training set based on the false samples and the true samples; the true samples are samples determined based on a second training set for training the preset diagnostic model;
the controller training module is used for inputting the first training set into an initial controller so as to perform gradient update on parameters of the initial controller by using an countermeasure training method and based on a pre-constructed objective function, so as to obtain a target controller;
and the large model construction module is used for constructing an active dialogue large model based on the preset questioning model, the preset diagnosis model and the target controller so as to utilize the active dialogue large model to carry out questioning dialogue.
2. The active conversation large model construction device of claim 1 wherein the model determination module comprises:
the training set determining unit is used for collecting historical question-answer contents of a plurality of users and carrying out inversion processing on the historical question-answer contents to obtain a third training set;
and the questioning model training unit is used for training the first generated pre-training transducer model by utilizing the third training set so as to obtain a preset questioning model.
3. The active conversation large model construction device of claim 2 wherein the model determination module comprises:
the diagnosis model training unit is used for determining the historical question and answer contents and the corresponding historical question and answer results of the plurality of users as a second training set, and training a second generation type pre-training transducer model by utilizing the second training set to obtain a preset diagnosis model.
4. The active conversation large model construction device of claim 3 wherein the training set construction module comprises:
and the training set construction unit is used for constructing a first training set based on the false sample, the first true sample and the second true sample.
5. The active conversation large model construction device of claim 4 further comprising:
the first true sample determining unit is used for deleting the historical inquiry results contained in each training sample in the second training set so as to obtain the first true sample;
a second true sample determining unit, configured to determine a historical question-answer content and a corresponding question-answer round included in any training sample in the second training set, and determine the second true sample based on a target historical question-answer content corresponding to a previous N rounds in the question-answer round; the front N rounds are from round 1 to round N, wherein N is smaller than the question-answer round.
6. The active conversation large model construction device of claim 1 further comprising:
a controller construction unit for constructing the initial controller based on the self-encoder and the classifier;
and the objective function construction unit is used for constructing the objective function based on the first loss function corresponding to the initial controller, the classification loss function corresponding to the classifier and the second loss function corresponding to the preset question model.
7. The active dialogue large model construction device according to any one of claims 1 to 6, characterized in that the large model construction module comprises:
the question and answer content acquisition unit is used for acquiring all round question and answer contents completed by the current target user so as to acquire the current historical question and answer content of the target user;
a diagnosable rate determining unit for inputting the current historical question-answer content to the active dialogue large model to determine a diagnosable rate corresponding to the current historical question-answer content through the target controller;
a question data generating unit, configured to input the current historical question content to the preset question model to generate current question data for the target user if the diagnosable rate is not greater than a preset threshold, and re-jump to the step of acquiring all round question contents completed by the current target user when the answer content of the target user for the current question data is acquired;
and the inquiry result generation unit is used for inputting the current historical inquiry response content into the preset diagnosis model to generate a corresponding inquiry result if the diagnosable rate is larger than the preset threshold value.
8. An active dialogue large model construction method is characterized by comprising the following steps:
determining a preset questioning model and a preset diagnosis model based on the generated pre-training transducer model;
inputting the real medical record information of the user into the preset questioning model to generate a corresponding false sample, and constructing a first training set based on the false sample and the true sample; the true samples are samples determined based on a second training set for training the preset diagnostic model;
inputting the first training set into an initial controller to perform gradient update on parameters of the initial controller by using an countermeasure training method and based on a pre-constructed objective function so as to obtain a target controller;
and constructing an active dialogue large model based on the preset questioning model, the preset diagnosis model and the target controller so as to conduct questioning dialogue by using the active dialogue large model.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the active conversation large model building method of claim 8.
10. A computer readable storage medium storing a computer program which when executed by a processor implements the active session large model construction method of claim 8.
CN202311499786.1A 2023-11-13 2023-11-13 Active dialogue large model construction device, method, equipment and storage medium Active CN117235239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311499786.1A CN117235239B (en) 2023-11-13 2023-11-13 Active dialogue large model construction device, method, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311499786.1A CN117235239B (en) 2023-11-13 2023-11-13 Active dialogue large model construction device, method, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117235239A true CN117235239A (en) 2023-12-15
CN117235239B CN117235239B (en) 2024-02-20

Family

ID=89082891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311499786.1A Active CN117235239B (en) 2023-11-13 2023-11-13 Active dialogue large model construction device, method, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117235239B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117474043A (en) * 2023-12-27 2024-01-30 湖南三湘银行股份有限公司 Intelligent question-answering system based on training model

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262618A1 (en) * 2016-03-11 2017-09-14 SafeLane Health, Inc. Systems and methods for managing a database during an examination
CN108304489A (en) * 2018-01-05 2018-07-20 广东工业大学 A kind of goal directed type personalization dialogue method and system based on intensified learning network
US20190012371A1 (en) * 2017-07-06 2019-01-10 International Business Machines Corporation Dialog agent for conducting task-oriented computer-based communications
CN111984771A (en) * 2020-07-17 2020-11-24 北京欧应信息技术有限公司 Automatic inquiry system based on intelligent conversation
CN112271001A (en) * 2020-11-17 2021-01-26 中山大学 Medical consultation dialogue system and method applying heterogeneous graph neural network
CN112397197A (en) * 2020-11-16 2021-02-23 康键信息技术(深圳)有限公司 Artificial intelligence-based inquiry data processing method and device
CN113223735A (en) * 2021-04-30 2021-08-06 平安科技(深圳)有限公司 Triage method, device and equipment based on session representation and storage medium
CN113569017A (en) * 2021-01-28 2021-10-29 腾讯科技(深圳)有限公司 Model processing method and device, electronic equipment and storage medium
CN114781402A (en) * 2022-05-12 2022-07-22 平安科技(深圳)有限公司 Method and device for identifying inquiry intention, electronic equipment and readable storage medium
US20230034414A1 (en) * 2019-12-12 2023-02-02 Nippon Telegraph And Telephone Corporation Dialogue processing apparatus, learning apparatus, dialogue processing method, learning method and program
CN115982335A (en) * 2023-02-14 2023-04-18 智慧眼科技股份有限公司 Active AI medical question-answering system, method, equipment and storage medium
CN116910212A (en) * 2023-07-12 2023-10-20 平安科技(深圳)有限公司 Dialog generation method, apparatus, electronic device, and computer-readable storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262618A1 (en) * 2016-03-11 2017-09-14 SafeLane Health, Inc. Systems and methods for managing a database during an examination
US20190012371A1 (en) * 2017-07-06 2019-01-10 International Business Machines Corporation Dialog agent for conducting task-oriented computer-based communications
CN108304489A (en) * 2018-01-05 2018-07-20 广东工业大学 A kind of goal directed type personalization dialogue method and system based on intensified learning network
US20230034414A1 (en) * 2019-12-12 2023-02-02 Nippon Telegraph And Telephone Corporation Dialogue processing apparatus, learning apparatus, dialogue processing method, learning method and program
CN111984771A (en) * 2020-07-17 2020-11-24 北京欧应信息技术有限公司 Automatic inquiry system based on intelligent conversation
CN112397197A (en) * 2020-11-16 2021-02-23 康键信息技术(深圳)有限公司 Artificial intelligence-based inquiry data processing method and device
CN112271001A (en) * 2020-11-17 2021-01-26 中山大学 Medical consultation dialogue system and method applying heterogeneous graph neural network
CN113569017A (en) * 2021-01-28 2021-10-29 腾讯科技(深圳)有限公司 Model processing method and device, electronic equipment and storage medium
CN113223735A (en) * 2021-04-30 2021-08-06 平安科技(深圳)有限公司 Triage method, device and equipment based on session representation and storage medium
WO2022227203A1 (en) * 2021-04-30 2022-11-03 平安科技(深圳)有限公司 Triage method, apparatus and device based on dialogue representation, and storage medium
CN114781402A (en) * 2022-05-12 2022-07-22 平安科技(深圳)有限公司 Method and device for identifying inquiry intention, electronic equipment and readable storage medium
CN115982335A (en) * 2023-02-14 2023-04-18 智慧眼科技股份有限公司 Active AI medical question-answering system, method, equipment and storage medium
CN116910212A (en) * 2023-07-12 2023-10-20 平安科技(深圳)有限公司 Dialog generation method, apparatus, electronic device, and computer-readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡泽;张展;左德承;: "基于特定领域知识的医疗问答系统信息质量预测", 智能计算机与应用, no. 06 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117474043A (en) * 2023-12-27 2024-01-30 湖南三湘银行股份有限公司 Intelligent question-answering system based on training model
CN117474043B (en) * 2023-12-27 2024-04-02 湖南三湘银行股份有限公司 Intelligent question-answering system based on training model

Also Published As

Publication number Publication date
CN117235239B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
US11670324B2 (en) Method for predicting emotion status and robot
CN117235239B (en) Active dialogue large model construction device, method, equipment and storage medium
Doshi et al. Spoken language interaction with model uncertainty: an adaptive human–robot interaction system
Gong et al. Behavior explanation as intention signaling in human-robot teaming
US20220084428A1 (en) Learning content recommendation apparatus, system, and operation method thereof for determining recommendation question by reflecting learning effect of user
Deepika et al. Jollity Chatbot-a contextual AI assistant
CN108763495A (en) Interactive method, system, electronic equipment and storage medium
Schrempf et al. A generic model for estimating user intentions in human-robot cooperation
Nguyen et al. A framework for learning to request rich and contextually useful information from humans
Baskar et al. Cognitive architecture of an agent for human-agent dialogues
CN115982335A (en) Active AI medical question-answering system, method, equipment and storage medium
Petric et al. Towards a robot-assisted autism diagnostic protocol: Modelling and assessment with POMDP
CN117112742A (en) Dialogue model optimization method and device, computer equipment and storage medium
CN117009541A (en) Method, device, equipment and medium for constructing and applying clinical medicine inspection knowledge base
CN116630101A (en) Education teaching auxiliary system based on big data
CN113780394B (en) Training method, device and equipment for strong classifier model
EP4166079A1 (en) Conversation-based mental disorder screening method and device
Brockbank et al. Sampling data, beliefs, and actions
Ohmoto et al. A method to dynamically estimate emphasizing points and degree by using verbal and nonverbal information and physiological indices
CN116898441B (en) Character testing method and device based on man-machine conversation and electronic equipment
CN115409042B (en) Method and device for robot question answering based on thought guide graph
JP2019087155A (en) Supporting device, system, and program
CN113535903B (en) Emotion guiding method, emotion guiding robot, storage medium and electronic device
CN115309258B (en) Intelligent learning guiding method and device and electronic equipment
Nguyen et al. Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: No. 205, Building B1, Huigu Science and Technology Industrial Park, No. 336 Bachelor Road, Bachelor Street, Yuelu District, Changsha City, Hunan Province, 410000

Applicant after: Wisdom Eye Technology Co.,Ltd.

Address before: 410205, Changsha high tech Zone, Hunan Province, China

Applicant before: Wisdom Eye Technology Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant