CN116956831A

CN116956831A - Medical dialogue generation method, medical dialogue generation device, computer equipment and storage medium

Info

Publication number: CN116956831A
Application number: CN202310800603.9A
Authority: CN
Inventors: 刘卓; 徐卓扬
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-06-30
Filing date: 2023-06-30
Publication date: 2023-10-27

Abstract

The embodiment of the application belongs to the fields of artificial intelligence and digital medical treatment, and relates to a medical dialogue generating method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: inputting the first dialogue information into an initial medical dialogue generating model to obtain a plurality of first doctor reply information; training an initial rewarding model according to the reply information of each first doctor and the reply evaluation value thereof to obtain a rewarding model; inputting the second dialogue information into the initial medical dialogue generating model to obtain second doctor reply information; inputting the second doctor reply information into the rewarding model to obtain a first rewarding value; performing reinforcement learning training on the initial medical dialogue generating model according to the second doctor reply information and the first rewarding value thereof to obtain a medical dialogue generating model; inputting the third dialogue information into the medical dialogue generating model to obtain a third doctor answerInformation. The application also relates to a blockchain technology, and dialogue information can be stored in the blockchain _。 The application improves the efficiency of medical dialogue.

Description

Medical dialogue generation method, medical dialogue generation device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence and digital medical technology, and in particular, to a medical dialogue generating method, apparatus, computer device, and storage medium.

Background

With the development of internet technology, more and more medical institutions are beginning to provide medical services through the internet. On-line medical consultation is an important medical service that doctors communicate with patients via the internet and make diagnoses.

However, in the existing online medical inquiry system, a doctor is required to manually input a dialogue in a text form, personal factors of the doctor can have a great influence on the generation of the medical dialogue, for example, the doctor may have problems of misspelling, wrong grammar or unclear expression in the input process, so that dialogue information is inaccurate, and the progress of the medical dialogue is influenced. In addition, doctors also need time to perform conception and input of dialogue texts, and the efficiency of medical dialogue is further reduced.

Disclosure of Invention

The embodiment of the application aims to provide a medical dialogue generating method, a medical dialogue generating device, computer equipment and a storage medium, so as to solve the problem of low medical dialogue efficiency.

In order to solve the above technical problems, the embodiment of the present application provides a medical dialogue generating method, which adopts the following technical scheme:

acquiring first dialogue information, inputting the first dialogue information into an initial medical dialogue generation model to obtain a plurality of first doctor answer information, and acquiring answer evaluation values of the first doctor answer information;

Training an initial rewarding model according to the obtained first doctor answer information and the corresponding answer evaluation value thereof to obtain a rewarding model, wherein the rewarding model is matched with the answer evaluation value of the doctor answer information according to the rewarding value output by the doctor answer information;

acquiring second dialogue information, and inputting the second dialogue information into the initial medical dialogue generation model to obtain second doctor response information;

inputting the second doctor reply information into the rewarding model to obtain a first rewarding value;

performing reinforcement learning training on the initial medical dialogue generating model according to the obtained second doctor reply information and the corresponding first rewarding value thereof to obtain a medical dialogue generating model;

and acquiring third dialogue information, and inputting the third dialogue information into the medical dialogue generation model to obtain third doctor response information, wherein the first dialogue information, the second dialogue information and the third dialogue information have a plurality of information of the same type.

In order to solve the above technical problems, the embodiment of the present application further provides a medical dialogue generating device, which adopts the following technical scheme:

the first input module is used for acquiring first dialogue information, inputting the first dialogue information into the initial medical dialogue generation model to acquire a plurality of first doctor answer information, and acquiring answer evaluation values of the answer information of each first doctor;

The rewarding training module is used for training an initial rewarding model according to the obtained first doctor answer information and the corresponding answer evaluation value thereof to obtain a rewarding model, and the rewarding model is matched with the answer evaluation value of the doctor answer information according to the rewarding value output by the doctor answer information;

the second input module is used for acquiring second dialogue information and inputting the second dialogue information into the initial medical dialogue generation model to obtain second doctor response information;

the rewarding input module is used for inputting the second doctor answer information into the rewarding model to obtain a first rewarding value;

the reinforcement learning module is used for performing reinforcement learning training on the initial medical dialogue generation model according to the obtained second doctor answer information and the corresponding first rewarding value thereof to obtain a medical dialogue generation model;

the answer generation module is used for acquiring third dialogue information, inputting the third dialogue information into the medical dialogue generation model and obtaining third doctor answer information, wherein the first dialogue information, the second dialogue information and the third dialogue information have a plurality of information with the same type.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

Compared with the prior art, the embodiment of the application has the following main beneficial effects: acquiring first dialogue information, inputting an initial medical dialogue generation model to acquire a plurality of first doctor answer information, and acquiring answer evaluation values of the first doctor answer information, wherein the answer evaluation values can reflect the quality of the first doctor answer information; training an initial rewarding model according to the obtained first doctor answer information and the answer evaluation value thereof, wherein the obtained rewarding model can output rewarding values according to the doctor answer information, and the rewarding values are matched with the answer evaluation values of the doctor answer information, so that the advantages and disadvantages of the doctor answer information are evaluated through the rewarding values; obtaining second dialogue information and inputting the second dialogue information into an initial medical dialogue generation model to obtain second doctor response information, inputting the second doctor response information into a reward model to obtain a first reward value, and performing reinforcement learning training on the initial medical dialogue generation model according to the second doctor response information and the first reward value thereof to enable the model to output doctor response information with higher reward value, so that the model has more accurate, natural and reasonable output, and the medical dialogue generation model is obtained; the third dialogue information is acquired, the third dialogue information is dialogue information of doctors and patients in application, the medical dialogue generating model outputs third doctor response information according to the third dialogue information, automatic completion or generation of doctor responses in medical dialogue is achieved, doctors can concentrate on inquiry, complete text input is not needed, response information matched with the doctors can be obtained, and accuracy and efficiency of medical dialogue are improved.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a medical dialog generation method in accordance with the present application;

FIG. 3 is a schematic diagram of the structure of one embodiment of a medical dialog generating device in accordance with the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the medical dialogue generating method provided by the embodiment of the application is generally executed by a server, and accordingly, the medical dialogue generating device is generally disposed in the server.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow chart of one embodiment of a medical dialog generation method according to the present application is shown. The medical dialogue generating method comprises the following steps:

Step S201, acquiring first dialogue information, inputting the first dialogue information into an initial medical dialogue generation model to obtain a plurality of first doctor answer information, and acquiring answer evaluation values of the first doctor answer information.

In this embodiment, the electronic device (for example, the server shown in fig. 1) on which the medical dialogue generating method operates may communicate with the terminal device through a wired connection or a wireless connection. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection.

Specifically, first dialogue information is acquired. The dialogue information mentioned in the present application is first dialogue information, second dialogue information and third dialogue information, which are all information related to the dialogue of a doctor and a patient. Dialogue information is structured information, and contains next-level information of multiple types; and, the information types of the next-level information included in the first dialogue information, the second dialogue information, and the third dialogue information are the same.

The first dialogue information is input into an initial medical dialogue generating model to obtain a plurality of first doctor answer information. In the present application, the initial medical dialogue generation model/the medical dialogue generation model is used for outputting doctor response information according to the input dialogue information, wherein the doctor response information is response information automatically generated for a doctor, and is a possible response of the doctor, which is automatically generated in combination with the dialogue information, next.

Each first doctor answer information also has an answer evaluation value, and the answer evaluation value can be obtained through labeling and can represent the quality of the first doctor answer information.

Further, the step S201 may include: acquiring doctor information of a doctor, diagnosis and treatment information of the doctor on a patient, an existing medical dialogue of the doctor and the patient and an existing input of the doctor, and obtaining first dialogue information; converting the first dialogue information into an embedded vector; inputting the embedded vector into an initial medical dialogue generating model to obtain a plurality of first doctor response information; and obtaining the reply annotation information, and determining the reply evaluation value of each first doctor reply information according to the reply annotation information.

Specifically, doctor information of a doctor, diagnosis and treatment information of the doctor to a patient, an existing medical dialogue of the doctor and the patient, and an existing input of the doctor are acquired, and first dialogue information is generated based on the above information. It is noted above that the information types of the next-level information included in the first dialogue information, the second dialogue information, and the third dialogue information are the same, i.e., the first dialogue information, the second dialogue information, and the third dialogue information may all include doctor information of a doctor, diagnosis information of a doctor to a patient, an existing medical dialogue of a doctor and a patient, and an existing input of a doctor; except that the specific contents of the next-level information are different among the different dialogue information.

The doctor information can be information related to a doctor, such as an identification number of the doctor, a subdivided medical field in which the doctor is located, experience level and the like, and can also comprise medical dialogue between the doctor and other patients, text materials written by the doctor and the like; doctor information may express identity and expertise of a doctor. Diagnosis and treatment information can be generated when a doctor diagnoses a patient, and the diagnosis and treatment information comprises diagnosis results of the doctor on the illness condition of the patient, treatment schemes of the doctor for the patient and the like. Doctors and patients may have conducted questions and answers, which are recorded as existing medical dialogs. In conducting an on-line medical consultation, a doctor may enter information via a terminal that is noted as existing inputs, which may be fragmented, incomplete, and not yet sent to the patient. The medical dialogue generating method can complement the existing input. It will be appreciated that the existing inputs may also be blank, with the complete physician-answer information being generated by the medical dialog generation model.

And encoding the first dialogue information to obtain an embedded vector. The doctor information, diagnosis and treatment information, the existing medical dialogue and the existing input in the first dialogue information can be respectively converted into embedded vectors, and then the embedded vectors are spliced according to a preset sequence to obtain the embedded vectors of the first dialogue information.

The resulting embedded vector is input into an initial medical dialogue generation model, which may output a plurality of first physician-reply information according to the embedded vector.

And then obtaining reply annotation information, wherein the reply annotation information can be input by a doctor associated with the first dialogue information through a terminal. The doctor evaluates the accuracy, rationality, readability and the like of the first doctor reply information, and feeds back reply label information, wherein the reply label information has a reply evaluation value of the first doctor reply information. The answer evaluation value may be a number or letter, etc., capable of reflecting the quality of the first doctor answer information.

In this embodiment, doctor information of a doctor, diagnosis and treatment information of the doctor on a patient, an existing medical dialogue of the doctor and the patient, and an existing input of the doctor are obtained, so that first dialogue information is obtained, and the first dialogue information reflects professional fields, experience levels and personalized information of the doctor, so that accuracy and individuation of follow-up dialogue completion or reply are ensured; the first dialogue information is converted into an embedded vector and is input into an initial medical dialogue generation model, a plurality of first doctor answer information is obtained, corresponding answer evaluation values are obtained, and the smooth proceeding of subsequent model training is ensured.

Further, after the step of obtaining the answer evaluation value of each first doctor answer information, the method may further include: acquiring a preset reply reservation strategy; and screening the first doctor answer information according to the answer retention strategy and the answer evaluation value of the first doctor answer information, and retaining at least one screened first doctor answer information.

Specifically, a preset answer retention policy is obtained, the initial medical dialogue generation model outputs a plurality of first doctor answer information, and some first doctor answer information has poor quality and has no great significance to model training, and each first doctor answer information can be screened according to the answer retention policy; for example, first doctor answer information in which answer evaluation values are ranked in the top N or top M% may be retained; or, acquiring an evaluation value threshold value in the answer retention policy, and retaining first doctor answer information with the answer evaluation value greater than or equal to the evaluation value threshold value.

In this embodiment, the reply information of each first doctor is screened according to the reply retention policy, so that the reply information of the first doctor with higher value for model training is retained, and the accuracy of subsequent model training is ensured.

Step S202, training an initial rewarding model according to the obtained first doctor answer information and the corresponding answer evaluation value thereof to obtain a rewarding model, wherein the rewarding model is matched with the answer evaluation value of the doctor answer information according to the rewarding value output by the doctor answer information.

Specifically, the application sets an initial rewarding model/rewarding model, the initial rewarding model/rewarding model can output rewarding value according to doctor answer information, the rewarding value is rewarding signal for doctor answer information, and the rewarding value can reflect the quality degree or quality of the doctor answer information.

For this purpose, it is necessary to train an initial incentive model based on the obtained first doctor answer information and the answer evaluation value corresponding to the first doctor answer information, and obtain an incentive model. The bonus model outputs a bonus value according to the doctor's reply information that matches the reply evaluation value of the doctor's reply information, i.e., the bonus value is greater as the quality of the doctor's reply information is higher.

Step S203, second dialogue information is acquired, and the second dialogue information is input into the initial medical dialogue generating model to obtain second doctor answer information.

Specifically, second dialogue information of doctors and patients is acquired; the second reply information may be the same as or different from the first reply information in content.

And inputting the second dialogue information into the initial medical dialogue generating model to obtain second doctor response information. It should be noted that the initial medical dialogue generation model may output only one piece of second doctor reply information at a time based on the second dialogue information.

Step S204, inputting the second doctor answer information into the rewarding model to obtain a first rewarding value.

Specifically, the second doctor reply information is input into the trained reward model to obtain a first reward value.

Step S205, reinforcement learning training is performed on the initial medical dialogue generation model according to the obtained second doctor answer information and the corresponding first rewarding value, and the medical dialogue generation model is obtained.

Specifically, the first reward value of the second doctor reply information may reflect the quality of the second doctor reply information, that is, may represent the quality or quality of the conversation completion or reply performed by the initial medical conversation generating model.

According to the method, reinforcement learning training is carried out on the initial medical dialogue generation model according to the obtained second doctor response information and the corresponding first rewarding value, so that the model outputs the second doctor response information with the higher first rewarding value, the model outputs more accurate, natural and reasonable doctor response information, and the medical dialogue generation model is obtained after training is finished.

It will be appreciated that in order to ensure the accuracy of the reward model and the medical dialogue generation model obtained, a number of different doctors' dialogue information needs to be used in the training; each physician may also have multiple sets of dialogue information.

Step S206, obtaining third dialogue information, and inputting the third dialogue information into the medical dialogue generation model to obtain third doctor answer information, wherein the first dialogue information, the second dialogue information and the third dialogue information have a plurality of information of the same type.

Specifically, third dialogue information is acquired, and the third dialogue information may be dialogue information of a doctor and a patient in practical application.

The third dialogue information is input into the medical dialogue generating model to obtain third doctor response information, wherein the third doctor response information is automatically generated or completed doctor response information, and the current medical dialogue can be advanced according to the third doctor response information without the need of inputting all texts of a doctor.

Further, after the step of obtaining the third doctor response information, the method may further include: displaying third doctor response information through a terminal held by a doctor; and when receiving a confirmation instruction aiming at the third doctor reply information, sending the third doctor reply information to a terminal held by the patient and displaying the third doctor reply information.

Specifically, the third doctor response information is presented through a terminal held by the doctor so that the third doctor response information is viewed by the doctor. The doctor can directly operate the terminal to trigger the confirmation instruction, or the operation terminal can adjust the reply information of the third doctor and then trigger the confirmation instruction. And the server sends the confirmed third doctor response information to the terminal held by the patient and displays the third doctor response information according to the confirmation instruction, so that the current medical dialogue is promoted.

In this embodiment, third doctor reply information is displayed through a terminal held by a doctor; and the third doctor response information is sent to the terminal held by the patient according to the confirmation instruction and displayed, so that the medical dialogue between the doctor and the patient can be advanced.

In this embodiment, first dialogue information is acquired, an initial medical dialogue generation model is input to obtain a plurality of first doctor answer information, and answer evaluation values of the first doctor answer information are acquired, wherein the answer evaluation values can reflect the quality of the first doctor answer information; training an initial rewarding model according to the obtained first doctor answer information and the answer evaluation value thereof, wherein the obtained rewarding model can output rewarding values according to the doctor answer information, and the rewarding values are matched with the answer evaluation values of the doctor answer information, so that the advantages and disadvantages of the doctor answer information are evaluated through the rewarding values; obtaining second dialogue information and inputting the second dialogue information into an initial medical dialogue generation model to obtain second doctor response information, inputting the second doctor response information into a reward model to obtain a first reward value, and performing reinforcement learning training on the initial medical dialogue generation model according to the second doctor response information and the first reward value thereof to enable the model to output doctor response information with higher reward value, so that the model has more accurate, natural and reasonable output, and the medical dialogue generation model is obtained; the third dialogue information is acquired, the third dialogue information is dialogue information of doctors and patients in application, the medical dialogue generating model outputs third doctor response information according to the third dialogue information, automatic completion or generation of doctor responses in medical dialogue is achieved, doctors can concentrate on inquiry, complete text input is not needed, response information matched with the doctors can be obtained, and accuracy and efficiency of medical dialogue are improved.

Further, the step of training the initial reward model according to the obtained response information of each first doctor and the corresponding response evaluation value thereof to obtain the reward model may include: combining the obtained reply information of each first doctor in pairs to obtain a plurality of reply information pairs; inputting each reply information pair into the initial rewarding model to obtain second rewarding values of reply information of two first doctors in the reply information pair; calculating a prediction error of the reply information pair according to the second reward values and the reply evaluation values of the two first doctors in the reply information pair; and adjusting model parameters of the initial rewarding model according to the prediction error of each reply information pair until the model converges to obtain the rewarding model.

Specifically, the obtained first doctor reply information is combined in pairs to obtain a plurality of reply information pairs.

Each reply information pair is input into an initial rewarding model, and the initial rewarding model outputs second rewarding values according to the two first doctor reply information pairs. The second rewarding value is the judgment of the initial rewarding model on the quality of the two first doctor answer information, the answer evaluation value of the two first doctor answer information is equivalent to the labeling information, and the judgment of the doctor on the quality of the two first doctor answer information is carried out.

Based on the second prize values and the answer evaluation values of the two first physician answer information in the answer information pair, the prediction error of the answer information pair can be calculated, so that the total prediction error of each answer information pair can be obtained. Adjusting model parameters of an initial rewarding model with the aim of reducing prediction errors; and then carrying out iterative training on the initial rewarding model after parameter adjustment until the model converges to obtain the rewarding model.

In one embodiment, the model parameters of the initial reward model may also be adjusted according to the prediction error of a single reply information pair, then the next reply information pair is input, and the initial reward model is iteratively trained according to each reply information pair. In training, the model can learn the difference between the two first doctor reply information and embody the difference through the rewarding value, thereby embodying the advantages and disadvantages of the first doctor reply information.

In this embodiment, the obtained first doctor reply information is combined in pairs to obtain a plurality of reply information pairs; inputting each reply information pair into the initial rewarding model to obtain second rewarding values of reply information of two first doctors in the reply information pair; the second rewarding values of the two first doctors reply information are the judgment of the model on the merits of the reply information, and the reply evaluation value is the label on the merits of the reply information, so that the prediction error of the reply information pair can be calculated according to the second rewarding values and the reply evaluation values; and (3) adjusting model parameters of the initial rewarding model according to the prediction error of each reply information pair until the model converges, so that the accuracy of the obtained rewarding model is ensured.

Further, the step of calculating the prediction error of the reply information pair according to the second reward value and the reply evaluation value of the reply information of the two first doctors in the reply information pair may include: comparing the answer evaluation values of the two first doctor answer information in the answer information pair to obtain a first relationship; comparing the second prize values of the two first doctor reply messages to obtain a second relation; and calculating the prediction error of the reply information pair according to the first relation and the second relation.

Specifically, the answer evaluation values of the two first doctor answer information in the answer information pair are compared to obtain a first relation, and the first relation reflects the advantages and disadvantages of the two first doctor answer information obtained according to the answer evaluation values.

And comparing the second rewarding values of the two first doctor answer information to obtain a second relation, wherein the second relation reflects the advantages and disadvantages of the two first doctor answer information obtained according to the second rewarding values.

And when the first relation and the second relation are consistent, the rewarding value output by the initial rewarding model is accurate, otherwise, the rewarding value output by the initial rewarding model is inaccurate. Accordingly, depending on whether the first relationship and the second relationship agree, a prediction error of the reply information pair can be determined.

In this embodiment, the answer evaluation values of the two first doctor answer information in the answer information pair are compared to obtain a first relationship; comparing the second prize values of the two first doctor reply messages to obtain a second relation; according to whether the conclusions of the first relation and the second relation are consistent, the calculation of the reply information on the prediction error can be realized.

Further, the step S205 may include: calculating a jackpot value based on the obtained first prize value of the second physician-reply message; and adjusting model parameters of the initial medical dialogue generating model according to the direction of maximizing the cumulative prize value, and carrying out iterative training on the initial medical dialogue generating model after the second dialogue information input parameters are adjusted until the model converges to obtain the medical dialogue generating model.

Specifically, there is a concept of a jackpot value in reinforcement learning. The second dialogue information is input into the initial medical dialogue generating model to obtain second doctor response information, and the second doctor response information is input into the rewarding model to obtain the first rewarding value. In the iterative training, the obtained first reward values are accumulated to obtain a cumulative reward value.

It will be appreciated that the larger the jackpot value, the better the generation of the model representing the initial medical session generation. Model parameters of the initial medical dialogue generation model are adjusted in a direction to maximize the jackpot value. And then, the initial medical dialogue generating model with the second dialogue information input parameters adjusted is subjected to iterative training until the model converges, and the medical dialogue generating model is obtained.

The present application may employ a Model-based RL (Model-Based Reinforcement Learning) approach in reinforcement learning; and the loss function of PPO (Proximal Policy Optimization) algorithm is employed in reinforcement learning training of the initial medical dialogue generation model.

In this embodiment, a cumulative prize value is calculated based on the obtained first prize value of the second doctor's reply information, the cumulative prize value reflecting the effect of the model in the generation of multiple sessions; according to the direction of maximizing the cumulative prize value, the model parameters of the initial medical dialogue generating model are adjusted, the initial medical dialogue generating model with the second dialogue information input parameters adjusted is subjected to iterative training until the model converges, and the accuracy of the obtained medical dialogue generating model is ensured.

It is emphasized that to further ensure the privacy and security of the session information, the session information may also be stored in a blockchain node. The dialogue information may include first dialogue information and second dialogue information.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The application can be applied to the field of intelligent medical treatment, thereby promoting the construction of intelligent cities.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2 described above, the present application provides an embodiment of a medical dialogue generating device, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic apparatuses.

As shown in fig. 3, the medical dialogue generating device 300 according to the present embodiment includes: a first input module 301, a reward training module 302, a second input module 303, a reward input module 304, a reinforcement learning module 305, and a reply generation module 306, wherein:

The first input module 301 is configured to obtain first dialogue information, input the first dialogue information into the initial medical dialogue generation model to obtain a plurality of first doctor answer information, and obtain answer evaluation values of the first doctor answer information.

And the reward training module 302 is configured to train an initial reward model according to the obtained first doctor answer information and the corresponding answer evaluation value thereof, to obtain a reward model, where the reward value output by the reward model according to the doctor answer information is matched with the answer evaluation value of the doctor answer information.

The second input module 303 is configured to obtain second dialogue information, and input the second dialogue information into the initial medical dialogue generation model to obtain second doctor response information.

And a reward input module 304, configured to input second doctor response information into the reward model to obtain a first reward value.

The reinforcement learning module 305 is configured to perform reinforcement learning training on the initial medical dialogue generation model according to the obtained second doctor answer information and the corresponding first reward value, so as to obtain the medical dialogue generation model.

The answer generation module 306 is configured to obtain third dialogue information, and input the third dialogue information into the medical dialogue generation model to obtain third doctor answer information, where the first dialogue information, the second dialogue information, and the third dialogue information have several information of the same type.

In some alternative implementations of the present embodiment, the first input module 301 may include: the system comprises a first generation sub-module, a vector generation sub-module, a first reply sub-module and a label acquisition sub-module, wherein:

the first generation sub-module is used for acquiring doctor information of a doctor, diagnosis and treatment information of the doctor on a patient, existing medical dialogs of the doctor and the patient and existing input of the doctor, and obtaining first dialog information.

And the vector generation sub-module is used for converting the first dialogue information into an embedded vector.

And the first response sub-module is used for inputting the embedded vector into the initial medical dialogue generating model to obtain a plurality of first doctor response information.

The label acquisition sub-module is used for acquiring the reply label information and determining the reply evaluation value of each first doctor reply information according to the reply label information.

In some optional implementations of the present embodiment, the medical dialog generation device 300 may further include: the system comprises a strategy acquisition module and a reply screening module, wherein:

and the strategy acquisition module is used for acquiring a preset reply reservation strategy.

And the answer screening module is used for screening the first doctor answer information according to the answer retention strategy and the answer evaluation value of the first doctor answer information and retaining at least one screened first doctor answer information.

In some alternative implementations of the present embodiment, the reward training module 302 may include: the system comprises a reply combination sub-module, an input sub-module, an error calculation sub-module and a model adjustment sub-module, wherein:

and the reply combination sub-module is used for combining the obtained reply information of each first doctor in pairs to obtain a plurality of reply information pairs.

And the input sub-module is used for inputting each reply information pair into the initial rewarding model to obtain second rewarding values of the reply information of the two first doctors in the reply information pair.

And the error calculation sub-module is used for calculating the prediction error of the reply information pair according to the second reward values and the reply evaluation values of the reply information of the two first doctors in the reply information pair.

And the model adjustment sub-module is used for adjusting model parameters of the initial rewarding model according to the prediction errors of the reply information pairs until the model converges to obtain the rewarding model.

In some alternative implementations of the present embodiment, the error calculation sub-module may include: an evaluation value comparison unit, a bonus value comparison unit, and an error calculation unit, wherein:

And the evaluation value comparison unit is used for comparing the answer evaluation values of the answer information of the two first doctors in the answer information pair to obtain a first relation.

And the rewarding value comparison unit is used for comparing the second rewarding values of the two first doctor reply messages to obtain a second relation.

And an error calculation unit for calculating a prediction error of the reply information pair based on the first relationship and the second relationship.

In some alternative implementations of the present embodiment, the reinforcement learning module 305 may include: an accumulation calculation sub-module and an iterative training sub-module, wherein:

and a cumulative calculation sub-module for calculating a cumulative prize value based on the obtained first prize value of the second doctor's reply information.

And the iterative training sub-module is used for adjusting the model parameters of the initial medical dialogue generating model according to the direction of maximizing the cumulative prize value, and carrying out iterative training on the initial medical dialogue generating model after the second dialogue information input parameter adjustment until the model converges to obtain the medical dialogue generating model.

In some optional implementations of the present embodiment, the medical dialog generation device 300 may include: the reply display module and the reply sending module, wherein:

and the answer display module is used for displaying third doctor answer information through a terminal held by the doctor.

And the reply sending module is used for sending the third doctor reply information to the terminal held by the patient and displaying the third doctor reply information when receiving the confirmation instruction aiming at the third doctor reply information.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various types of application software installed on the computer device 4, such as computer readable instructions of a medical dialogue generating method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the medical dialogue generating method.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

The computer device provided in the present embodiment may perform the above-described medical dialogue generation method. The medical dialogue generation method here may be the medical dialogue generation method of each of the above embodiments.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of a medical dialog generation method as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. A medical dialogue generation method, comprising the steps of:

2. The medical dialogue generating method according to claim 1, wherein the step of acquiring first dialogue information, inputting the first dialogue information into an initial medical dialogue generating model to obtain a plurality of first doctor answer information, and acquiring answer evaluation values of the respective first doctor answer information comprises:

acquiring doctor information of a doctor, diagnosis and treatment information of the doctor on a patient, an existing medical dialogue of the doctor and the patient and an existing input of the doctor, and obtaining first dialogue information;

converting the first dialogue information into an embedded vector;

inputting the embedded vector into an initial medical dialogue generating model to obtain a plurality of first doctor reply information;

and obtaining reply annotation information, and determining a reply evaluation value of each first doctor reply information according to the reply annotation information.

3. The medical dialogue generating method according to claim 1, characterized by further comprising, after the step of acquiring the answer evaluation values of the answer information of the respective first doctors:

acquiring a preset reply reservation strategy;

and screening the first doctor answer information according to the answer retention strategy and the answer evaluation value of the first doctor answer information, and retaining at least one screened first doctor answer information.

4. The medical dialogue generation method according to claim 1, wherein the step of training an initial incentive model based on the obtained first doctor answer information and the answer evaluation values thereof, and obtaining the incentive model includes:

combining the obtained reply information of each first doctor in pairs to obtain a plurality of reply information pairs;

inputting each reply information pair into an initial rewarding model to obtain second rewarding values of the reply information of two first doctors in the reply information pair;

calculating the prediction error of the reply information pair according to the second reward values and the reply evaluation values of the reply information of the two first doctors in the reply information pair;

and adjusting model parameters of the initial rewarding model according to the prediction error of each reply information pair until the model converges to obtain the rewarding model.

5. The medical dialogue generating method according to claim 4, wherein the step of calculating a prediction error of the reply information pair based on the second prize values and the reply evaluation values of the two first doctor reply information in the reply information pair includes:

comparing the answer evaluation values of the answer information of the two first doctors in the answer information pair to obtain a first relationship;

Comparing the second prize values of the two first doctor reply messages to obtain a second relation;

and calculating the prediction error of the reply information pair according to the first relation and the second relation.

6. The medical dialogue generating method according to claim 1, wherein the step of performing reinforcement learning training on the initial medical dialogue generating model according to the obtained second doctor answer information and the corresponding first reward value thereof, to obtain a medical dialogue generating model includes:

calculating a jackpot value based on the obtained first prize value of the second physician-reply message;

and adjusting model parameters of the initial medical dialogue generation model according to the direction of maximizing the jackpot prize value, and carrying out iterative training on the initial medical dialogue generation model after the second dialogue information input parameter is adjusted until the model converges to obtain the medical dialogue generation model.

7. The medical dialogue generating method according to claim 1, characterized by further comprising, after the step of obtaining third doctor response information:

displaying the third doctor reply information through a terminal held by a doctor;

and when receiving a confirmation instruction aiming at the third doctor reply information, sending the third doctor reply information to a terminal held by the patient and displaying the third doctor reply information.

8. A medical dialogue generating device, comprising:

9. A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the medical dialog generation method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the medical dialog generation method of any of claims 1 to 7.