CN114999676A

CN114999676A - Method, system, apparatus and medium for automatically replying to medical consultation

Info

Publication number: CN114999676A
Application number: CN202210747520.3A
Authority: CN
Inventors: 伏冠宇; 彭爽; 杨明晖
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-09-02

Abstract

There is provided a computer-implemented method of automatically replying to a medical consultation, including: acquiring a medical conversation preamble, wherein the medical conversation preamble comprises medical consultation; retrieving, using a retrieval model, one or more first candidate replies to the medical consultation in a medical question and answer knowledge base based on the medical dialogue precursor; based on the medical dialogue preamble, automatically generating one or more second candidate replies to the medical consultation by using a dialogue generation model, wherein the one or more first candidate replies and the one or more second candidate replies form a candidate reply set; scoring the candidate replies in the candidate reply set by using a scoring model; and selecting a reply to the medical advice from the set of candidate replies based on the scoring result of the candidate replies in the set of candidate replies. The present application also relates to associated systems, apparatuses, and media. The scheme of the application can more flexibly and reliably automatically reply the medical consultation.

Description

Method, system, apparatus and medium for automatically replying to medical consultation

Technical Field

One or more embodiments of the present specification relate to natural language processing, and more particularly, to a method, system, apparatus, and computer-readable storage medium for automatically replying to a medical consultation.

Background

With the rapid development of the internet, online question answering becomes an important way for users to rapidly acquire knowledge and solve problems. However, there are some limitations to the manual question-answering system in which answers are provided by real users. For example, in the professional fields of finance, medical treatment and the like, the number of professionals on the online question and answer platform is far less than the actual demand number. The huge contradiction between the insufficient number of the professionals and the requirement of asking questions of the user brings about the problems of large working pressure of the professionals, untimely feedback of the user and the like. For example, in the medical field, the number of medical practitioners in every thousand of people currently in China is only 2.44, and the problems of insufficient inquiry resources, high doctor pressure, long waiting time of patients and the like exist for a long time.

For rural areas, especially remote mountain villages, medical conditions are relatively worse due to insufficient medical care personnel, inconvenient traffic, language obstruction, lack of knowledge related to patients and the like. The existing remote medical system mostly adopts remote inquiry, namely, real doctors provide inquiry service for people in a remote audio and video connection mode, but the mode still cannot solve the problems of insufficient medical staff, unsmooth language and the like.

In addition, since the expression pattern of the citizen usage in the remote mountain village may be different from the expression pattern of the citizen usage in the city, there may be no exact match of the medical consultation on the remote mountain village citizen in the medical question-answer knowledge base mainly sampled in the city citizen, and the insufficient generalization ability makes it difficult for the medical consultation service based on the medical question-answer knowledge base alone to serve the citizen in the remote mountain area.

Therefore, there is a need for a solution capable of automatically replying to medical consultation, and particularly, for a solution capable of providing a medical consultation service to people in remote mountain villages.

Disclosure of Invention

To overcome the defects of the prior art, one or more embodiments of the present specification implement a solution for automatically replying to medical advice with better generalization capability and more accurate reply by combining a dialog generation system and a retrieval system.

One or more embodiments of the present specification achieve the above objects by the following technical solutions.

In one aspect, there is provided a computer-implemented method of automatically replying to medical advice, comprising: acquiring a medical conversation preamble which comprises medical consultation; retrieving, using a retrieval model, one or more first candidate replies to the medical consultation in a medical question and answer knowledge base based on the medical conversation precursor; automatically generating one or more second candidate replies to the medical consultation using a dialogue generation model based on the medical dialogue preamble, the one or more first candidate replies and the one or more second candidate replies constituting a set of candidate replies; scoring the candidate replies in the candidate reply set by using a scoring model; and selecting a reply to the medical advice from the set of candidate replies based on a scoring result of the candidate replies in the set of candidate replies.

Preferably, wherein the dialogue generating model is trained using a dialogue library associated with a specific territory.

Preferably, wherein obtaining the medical session preamble includes receiving a telephone voice from a user and converting the telephone voice to text, and the method further comprises converting the reply to telephone voice for output to the user's telephone.

Preferably, wherein the telephony speech is dialect and converting the telephony speech to text comprises converting the dialect to mandarin text and/or converting the reply to telephony speech comprises converting the reply to telephony speech in dialect form.

Preferably, the method further comprises: determining role information associated with the medical session context, the role information indicating whether an originator of a session in the session context is a patient or a doctor; and processing the medical session preamble based on the role information as input to the session generation model.

Preferably, the method further comprises: predicting entity information associated with the response, the entity information relating to one or more of a symptom, an incentive, a drug, a treatment regimen of the patient; and using the predicted entity information together with the medical session preamble as input to the session generation model.

Preferably, the scoring model is trained based on a set of medical dialog samples, each dialog sample comprising a medical dialog preamble-reply pair, the dialog samples being divided into positive samples and negative samples, wherein replies in the positive samples match the medical dialog preamble and replies in the negative samples do not match the medical dialog preamble.

Preferably, the method further comprises: automatically obtaining at least a portion of a medical profile of a patient from a medical facility based on the session preamble; converting the at least a portion of the medical profile into text as part of the session preamble.

Preferably, the method further comprises: sending an instruction for acquiring health condition data of the patient to a health condition acquisition device based on the session preamble; receiving health data from the health collection device; and converting the health data to text as part of the dialog context.

Preferably, the method further comprises: selectively saving the reply in the knowledge base of medical questions and answers based on the feedback on the reply.

Preferably, the method further comprises: automatically calling medical emergency services for the patient; or automatically make an appointment for the patient at the hospital.

Preferably, wherein at least a part of the preceding part of the medical session is from a doctor, and wherein the method is used to assist a doctor in performing a diagnosis.

In another aspect, there is provided a computer-implemented system for automatically replying to medical advice, comprising: a retrieval model for retrieving, in a knowledge base of medical questions and answers, one or more first candidate replies to medical advice included in a preamble of a medical session based on the preamble of the medical session; a conversation generation model for automatically generating one or more second candidate replies to the medical consultation based on the medical conversation preamble, the one or more first candidate replies and the one or more second candidate replies constituting a set of candidate replies; a scoring model for scoring candidate replies in the set of candidate replies; and a reply module for selecting a reply to the medical consultation from the candidate reply set based on a scoring result of a candidate reply from the candidate reply set.

Preferably, the system further comprises a preprocessing module, wherein the preprocessing module is used for converting the received telephone voice from the user into text, and the reply module is also used for converting the reply into the telephone voice in dialect form.

Preferably, the system further comprises: a role determination model to determine role information associated with the medical conversation preamble, the role information indicating whether an originator of the session in the conversation preamble is a patient or a doctor, wherein the medical conversation preamble is processed based on the role information as an input to the conversation generation model.

Preferably, the system further comprises: an entity prediction model to predict entity information associated with the reply, the entity information relating to one or more of a symptom, an incentive, a drug, a treatment plan of the patient, wherein the predicted entity information is input to the session generation model along with the medical session preamble.

Preferably, the system further comprises a medical profile acquisition module for automatically acquiring at least a portion of a medical profile of a patient from a medical facility based on the session preamble, wherein the at least a portion of the medical profile is included as part of the session preamble.

Preferably, the system further comprises a health data acquisition module for: sending an instruction for acquiring health condition data of the patient to a health condition acquisition device based on the conversation preamble; receiving health data from the health collection device; and converting the health data to text as part of the dialog preamble.

Preferably, the system further comprises: the emergency service module is used for automatically calling medical emergency service for the patient; or the appointment module is used for automatically registering the appointment for the patient.

Preferably, the system further comprises: a saving module to selectively save the reply in the knowledge base of medical questions and answers based on the feedback to the reply.

In yet another aspect, an apparatus for determining an impact of an article on a product or service is provided, comprising: a memory; and a processor configured to perform the method of any of the above.

In yet another aspect, a computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform the above-described method is provided.

Compared with the prior art, one or more embodiments of the present specification can achieve one or more of the following technical effects:

the medical consultation service with higher quality can be automatically provided, and the problem of insufficient number of doctors is solved;

more convenient service is provided for people in remote mountain villages and the like;

the method has adaptability to patients in different regions and different living habits; and

is suitable for various medical consultation scenes.

Drawings

The foregoing summary, as well as the following detailed description of the embodiments, is better understood when read in conjunction with the appended drawings. It is to be noted that the appended drawings are intended as examples of the claimed invention. In the drawings, like reference characters designate the same or similar elements.

Fig. 1 is a schematic diagram illustrating an example of an application scenario of an auto question-answering scheme according to an embodiment of the present specification.

Fig. 2 shows a schematic diagram of an overall process for generating a reply to a query in accordance with an embodiment of the present description.

FIG. 3 shows a schematic diagram of a process for automatic question answering according to an embodiment of the present description.

Fig. 4 is a schematic diagram illustrating an operation of a process for performing preprocessing on a dialog preamble according to an embodiment of the present description.

Fig. 5 illustrates contents before and after data washing is performed on the dialogue preamble according to an embodiment of the present description.

Fig. 6 illustrates a schematic diagram of an example model for performing role determination in accordance with an embodiment of the present description.

FIG. 7 illustrates a schematic diagram of an example model for performing entity prediction in accordance with an embodiment of the present description.

FIG. 8 illustrates a schematic diagram of an example model for performing reply generation in accordance with an embodiment of the present description.

FIG. 9 illustrates a schematic diagram of processing raw data into input for a dialog generation model according to an embodiment of the present description.

Fig. 10 shows an example of a process that may be used to vectorize a dialog context or query.

FIG. 11 illustrates an example of a process for retrieving candidate replies using the FAISS model in accordance with embodiments of the present specification.

FIG. 12 shows a schematic diagram of a scoring model for scoring replies, according to an embodiment of the present description.

FIG. 13 shows a schematic flow diagram of an example automatic question-answering method implemented by a computer in accordance with an embodiment of the present specification.

FIG. 14 illustrates a schematic flow chart diagram of another example automatic question-answering method in accordance with an embodiment of the present specification.

FIG. 15 illustrates a schematic diagram of an example automatic question-answering system, according to an embodiment of the present specification.

Fig. 16 shows a schematic flow diagram of a method of automatically replying to medical advice in accordance with an embodiment of the present description.

Fig. 17 shows a schematic diagram of an example system for responding to medical advice in accordance with an embodiment of the present description.

FIG. 18 shows a schematic block diagram of an apparatus for implementing a system in accordance with one or more embodiments of the present description.

Detailed Description

The following detailed description is sufficient to enable any person skilled in the art to understand the technical content of one or more embodiments of the present specification and to implement the same, and the objects and advantages related to one or more embodiments of the present specification can be easily understood by those skilled in the art from the description, claims and drawings disclosed in the present specification.

Referring to fig. 1, a schematic diagram of an example of an application scenario of an automatic question-answering scheme according to an embodiment of the present specification is shown. Specifically, FIG. 1 shows a schematic view of an interface 100 for a machine interview session. It should be appreciated that the application scenario in FIG. 1 is merely an example and not a limitation. Embodiments of the present description are not limited to machine interrogation, but may be applied to a variety of automated dialog scenarios, including, but not limited to: intelligent customer service, pre-sale consultation, seat assistant, intelligent chat, knowledge inquiry, and the like. Further, the automated question-and-answer system may be implemented using interfaces other than those shown in FIG. 1, such as web pages, desktop clients, smart phone applications, applets, SMS messages, voice calls, video phones, and so forth. The interface may include more, fewer, or different interface elements than the example in fig. 1.

As shown in fig. 1, the interface 100 may include a conversation preamble 102. The dialog preamble refers to the dialog content that has already appeared in the current session. In fig. 1, the following dialog preamble 102 is shown: "I get acute diarrhea, do not have effect when going to hospital and inpatient transfusion, how to treat

"," hello ", ask for how long the diarrhea was. "," 2 to 3 weeks la! "," asking for diarrhea and asking for any obvious symptoms

The frequency of the diarrhea and the abdominal pain are high, and the diarrhea and the abdominal pain feel after each time of the diarrhea, and the diarrhea and the abdominal pain do not relieve after the transfusion. "in the conversation preamble 102, multiple sessions may be included, where each utterance in the conversation may be referred to as a session. Shown in the example of FIG. 15 sessions are made.

It should be appreciated that while the conversation preamble in fig. 1 includes only text, the conversation preamble of the present specification is not so limited, but may include, for example and without limitation: text, images, audio, video, expressions, documents, and so forth. The session context may be transmitted by one or more participants, such as in the machine interrogation example of fig. 1, by a smart physician 102, which may be implemented by an automated question and answer system as described in embodiments of the present specification, and a patient 104, which may be a human participant (hereinafter referred to as a "user"). Typically, the dialog context may or may not include content transmitted by the user (e.g., patient 104) and by the automated question and answer system (e.g., smart physician 102). Although only a single user is shown in fig. 1, embodiments of the present description are equally applicable in scenarios where multiple users are present, such as group chat scenarios or other people conversation scenarios.

As shown in fig. 1, the dialog preamble may include a query 108 to be answered by the automated question answering system. In most cases, the query is usually the last session or sessions sent by the user in the session context, as in fig. 1 "the frequency is relatively high and each time the diarrhea is experienced as a belly sore, there is no relief after the infusion. "in some cases, the query may also be one or more previous sessions sent by the user.

It should be appreciated that the "query" of the present description may be any form of content for which an automated question and answer system is desired to respond. The query may include an explicit query or may include a non-explicit query. Explicit questions may be, for example, questions in the form of sentences, e.g. "how to treat

"and the like. Non-explicit queries may include other forms of queries, such as "2 to 3 cheer," which, although only a declarative sentence, the user may desire the automated question-answering system to respond to, and thus may also be referred to as queries. In the example of FIG. 1, for a query of "2 to 3 cheer", the intelligent doctorWhen the raw material asks for diarrhea, there are obvious symptoms

"to reply. Even more, the query may be a simple call, such as "Hi", "hello", or a simple expression, etc.

As shown in FIG. 1, the interface 100 may include an input area 110. The user may enter content, such as a query 108, through the input area 110. For example, the user may enter text through a text box in the input area 110, may click a microphone button to enter speech, may click a smiley icon to enter an emoticon, or may click a plus icon to enter other forms of content, such as a document, etc.

Embodiments of the present description may be used to generate a reply to the query 108.

Referring to fig. 2, a schematic diagram of an overall process 200 for generating a reply to a query in accordance with an embodiment of the present description is shown.

As shown in fig. 2, process 200 may include: at operation 202, a conversation preamble may be received. As described above, various forms of conversation preambles may be received in various ways. For example, the conversation preamble may be received by way of a web page, a desktop client, a smartphone application, an applet, an SMS message, a voice call, a video phone, or a combination thereof. For example, in an intelligent customer service scenario, a dialog preamble may be received through a customer service dialog function in an e-commerce application.

As described above, the dialog context may take various forms, including but not limited to: text, images, audio, video, expressions, documents, and so forth.

It should be understood that a portion of the content of the dialog preamble may be generated by an automated question and answer system as an embodiment of this specification, and thus the portion of the content may not be received, but only the content sent by the user. In the case of receiving only the content sent by the user, the content sent by the user together with the content generated (and usually stored) by the automatic question-answering system may constitute a conversation preamble (usually taking into account the order of sending the conversation). Therefore, the reception session preamble herein may refer to reception of the entire session preamble, or may refer to reception of the content transmitted by the user in the session preamble. For convenience of explanation, the above-described scenarios will be collectively referred to as "receive session preamble" hereinafter.

As described above, the dialog front may typically include a query to be replied to by the automated question and answer system, such as query 108 in FIG. 1.

After receiving the dialog context, process 200 may include: at operation 204, a reply to the query included in the dialog preamble may be generated. This process is generally generated by an automated question-answering system as in accordance with embodiments of the present specification, and is a key part of embodiments of the present specification, which will be described in detail below.

After generating the reply to the query, process 200 may include: at operation 206, a reply to the query may be output. Various forms of replies may be output in various ways. For example, the reply may be output in the form of a web page, desktop client, smartphone application, applet, SMS message, voice call, video phone, etc. In general, outputting a reply may be done in the same manner as receiving a reply. For example, in the example of fig. 1, the generated reply may be displayed in the interface 100 below the query 108.

For example, the reply may also be output in the form of text, images, audio, video, emotions, documents, and so forth.

Referring to fig. 3, a schematic diagram of a process 300 for automatic question answering according to an embodiment of the present description is shown.

As shown in fig. 3, after receiving a conversation preamble 302, the process 300 may include: optionally, the session preamble is pre-processed.

After preprocessing the conversation preamble, the process 300 may include: at operation 312, a reply retrieval operation may be performed to obtain one or more first candidate replies.

In parallel, in series, or in other order with the reply generation process, process 300 may include: at operation 310, a reply generation operation may be performed to generate one or more second candidate replies. In general, this operation 310 can be performed by a dialog generation model, which can utilize a machine learning model.

Prior to generating the candidate reply, the process 300 may include: at operation 306, role determination 306 and/or entity prediction 308 may be performed.

The one or more first candidate replies and the one or more second candidate replies may constitute a set of candidate replies.

Subsequently, the process 300 may include: at operation 314, a reply scoring operation may be performed on the replies in the candidate reply set. In general, this operation 314 may be performed by retrieving a model.

Finally, the process 300 may output a reply, which may be, for example, the reply with the highest score.

Next, specific details of each operation in fig. 3 are described specifically.

Pretreatment of

Referring to fig. 4, a schematic diagram illustrating the operation of a process 400 for performing preprocessing of conversation preambles in accordance with an embodiment of the present description is shown.

As shown in fig. 4, process 400 may include: alternatively, if the dialog preamble includes content in a non-text form, the dialog preamble may be converted to a text form to obtain the dialog preamble text in operation 402.

For example, for conversation precursor content in the form of images, audio, video, etc., image processing and/or audio processing may be performed on the content to identify words therein, thereby converting the conversation precursor content into text.

For the emotion-form dialog preamble content, a conversion may be performed to convert the emotion into corresponding text. For example, a "smile" expression may be converted to "smile" text, a "thumb" expression may be converted to "praise" text, and so on.

For the document form of the dialog preamble, the title of the document may be read, the content of the document may be read, keywords of the document may be extracted, the abstract of the document may be extracted, and so on, or a combination thereof.

The above-described processing may be performed in any manner known or conceivable in the art and will not be described in detail herein.

It is to be appreciated that if the dialog context does not include non-textual content, then operation 402 need not be performed.

The process 400 may also include: at operation 404, a data flush may be performed on the conversation preamble to convert the conversation preamble into high quality conversation preamble text that is easy to perform subsequent operations. This operation may include, for example, removing duplicate punctuation marks, adding missing punctuation marks, removing line breaks and tabs in sentences, merging consecutive conversations of the same user, and so forth. Other data cleansing operations as contemplated by those skilled in the art may be implemented as desired.

Referring to fig. 5, contents before and after data washing is performed on the dialog preamble according to an embodiment of the present description are shown.

As shown in fig. 5, the data before cleaning includes statements such as line feed character "\ n" and tab character "\ t" that have been deleted, "belly pain in constipation and holding up", periods are added, two consecutive sessions of the patient "the previous day starts not to be the big previous day" and "one time per day at ordinary times" are combined, commas are added, and the like.

The process 400 may also include: at operation 406, optionally, sample processing and selection may be performed on the dialog context.

It will be appreciated that the purpose of the pre-processing of the dialog text is to provide a higher quality input for subsequent processing. For example, one of the purposes is to provide high quality input or training samples for subsequent machine learning models. Subsequent machine learning models may include a dialog generation model, and optionally a role determination model and an entity prediction model. For the machine learning model described above, a "training phase" and a "prediction phase" may typically be included. In the "prediction phase," what is input is a conversation preamble, based on which the machine learning model generates one or more second candidate replies. Before the machine learning model can be used for prediction, the machine learning model is trained in a training phase by using a large amount of training sets. In the training phase, the machine learning model may be trained in a supervised manner using a training set, and training examples in the training set may include "training examples" and corresponding "labels", which may take the form of (training examples, labels) pairs, for example.

For a dialog generation model, the training sample may be a "dialog front" and the tags may be actual replies to queries in the dialog front.

That is, during the "training phase", although referred to above as preprocessing the "conversation preamble", the entire conversation, including the conversation preamble and the reply, may actually be preprocessed. As shown in fig. 5, in the post-cleaning text, session 505 ("gastrointestinal dysfunction and intestinal abnormality are not excluded, and a scan perspective is recommended to a hospital clinic") is an actual reply, which can be used as a label in the training phase of the dialog generating model to perform supervised training on the dialog generating model.

In a preferred embodiment of the present specification, the training sample of the dialog generation model may further include "entity information", that is, the training sample may be "dialog preamble + entity information", and its corresponding tag is an actual "reply" to the query of the dialog preamble. The term entity information as used herein refers to an entity having a specific meaning in a text. In some embodiments, the domain to which the entity belongs may not be limited. Whereas in the preferred embodiment, only domain-specific entities may be used. Taking the medical field as an example, the entity information refers to fine category information related to the medical entity in the current text, including specific categories under categories such as symptoms, medicines, attributes and the like. For example, a list of particular entities may be specified, such as a list of entities included in the categories described above. For example, lists may be provided separately for symptoms, drugs, attributes, so that entities may be identified and used separately for the above entity categories.

Thus, when training the dialog generation model, replies that include entity information may be preferentially selected for placement in the training set.

For the role determination model, the training sample may be a "dialog preamble" or a "query", and its corresponding label may be the role associated with the dialog preamble or the query as the label.

For an entity prediction model, a training sample may be a "dialog front," and its corresponding label may be the entity included in the actual reply to the query in the dialog front as the label.

In training the entity prediction model, replies including entity information may also be preferably selected to be placed in the training set.

In summary, when preprocessing is performed to train the entity prediction model or the dialogue generation model, the replies including entity information may be preferentially selected to be put into the training set in the preprocessing stage. Alternatively, the training sample selection operation described above may be performed at a different stage.

Role determination

In order to enable the dialog generation model to better generate candidate replies, it may be helpful to know the role information for each session in the dialog context. For example, in the example of FIG. 1, it is known that "I have got acute diarrhea, there is no effect of hospitalization infusion, how to treat

"," 2 to 3 hula! The "and" frequency is relatively high and each time the diarrhea feels the belly sore, there is no relief after the infusion, "these three sessions are sent by the user (human patient)," hello, ask for how long the diarrhea is. "when asking for diarrhea, there is obvious symptom

"these two sessions are sent by a smart doctor (e.g., the automated question and answer system of the present illustrative embodiment). With knowledge of the role information of the session in the dialog context, the role determination may no longer need to be performed.

However, in some cases, the received dialog context does not necessarily include the role of the sender of the session. For example, in some examples, only content of a conversation preamble including multiple sessions may be received, without including role information in the conversation preamble.

The embodiments of the present description are equally applicable to such a case. In such a case, to improve the quality of the reply generated by the dialog generation model, the role of the sender of the session in the dialog context may be determined by the role determination model.

Referring to fig. 6, a schematic diagram of an example model for performing role determination is shown, according to an embodiment of the present description.

The example role determination model 600 of fig. 6 is implemented based on the BERT model or a variant thereof. The BERT model is a machine learning model for natural language processing that typically takes sentences or sentence pairs (e.g., "question" - "reply" pairs) as input and provides corresponding output according to task needs (e.g., based on tags).

As shown in fig. 6, the role determination model 600 receives a session 604 for which role information (sender) is to be predicted. In a preferred embodiment of the present description, the role determination model 600 may also receive the context 602 of the session 604 as additional information. This context 602 may be, for example, the last session of the session 604, "that stomach is uncomfortable bringing up something" in fig. 6

"in other examples, previous sessions of the session, and/or subsequent sessions of the session, may also be used as additional information to determine role information for the session 604.

The performance of the role determination model 600 can be further enhanced by using the context of the session as additional information to determine role information for the session.

The session 604 and the context 602 are input to the BERT model 606 for processing, and may be subjected to embedding (embedding) operations before processing, such as symbol embedding (token embedding), segment embedding (segmentation embedding), position embedding (position embedding), and the like.

In an embodiment of the present specification, in training a role determination model, a session (which in a preferred example also includes the context of the session) is used as a training sample, and the role determination model is trained using the role information of the session (the sender of the session) as a label.

The trained role determination model can then be used to determine role information for a session (e.g., session 604) based on the session and its context (e.g., context 602). In the example of fig. 6, for an incoming conversation 604 "gastritis may be present" and the context "that stomach is strange

", the role determination model 600 may output its prediction of role information, i.e., determine its conversational role 606 to be a doctor.

Based on the concepts disclosed herein, one of ordinary skill in the art may determine a session issuer in a dialog preamble using a role determination model in any manner different from the specific manner described above.

Entity prediction

As described above, entity information may improve the performance of a dialog generation model when training the dialog generation model or using the dialog generation model to generate a dialog.

To be able to utilize the entity information, entity prediction operations (shown as 308 in FIG. 3) may be performed. Entity prediction is to predict the entity involved in or involved in a reply to a challenge included in a session context.

Referring to fig. 7, a schematic diagram of an example model for performing entity prediction is shown, according to an embodiment of the present description.

The example entity prediction model 700 of fig. 7 is implemented based on a BERT model or a variant thereof.

As shown in FIG. 7, the entity prediction model 700 receives as input a dialog preamble 702. After processing at input 704, symbol embedding 706, fragment embedding 708, position embedding 710, encoder 712 (which may be implemented with a BERT model), symbolic representation 714, pooling layer 716, and Softmax 718, etc., entity prediction model 700 may output as tags entities included or involved in replies to queries included in the conversation preamble.

In an embodiment of the present specification, when training an entity prediction model, the entity prediction model is trained using a dialog preamble as a training sample and using an entity involved in an actual reply of the dialog preamble as a tag. The entities involved in the actual reply may be implemented, for example, by manual tagging or may be implemented in other automated ways (e.g., using a machine learning model to perform named entity recognition on the actual reply, etc.).

Preferably, the entity may be a domain-specific entity, as described above. Taking the medical field as an example, the entity information refers to fine category information related to the medical entity in the current text, including specific categories under categories such as symptoms, medicines, attributes and the like. For example, a list of particular entities may be specified, such as a list of entities included in the categories described above. For example, lists may be provided separately for symptoms, drugs, attributes, so that entities may be identified and used separately for the above entity categories.

For example, for the conversation preamble 102 as shown in FIG. 1, the entity prediction model 700 may output its prediction of entity information, e.g., may determine that the entity included or involved in the reply is "cause" and "enteroscopy". It can be seen that in a preferred example, both "cause" and "enteroscopy" are entities in the medical field.

Entities predicted by entity prediction model 700 may then be used in subsequent dialog generation models. For example, the dialog generation model may use "dialog preamble + entity information" as input. FIG. 9, which shows a schematic diagram of processing raw data into input for a dialog generation model, according to an embodiment of the present description. For example, as shown in fig. 9, input 904 may include a dialog preamble 906 and entity information 908, where the entity information is "cause enteroscopy. By using this additional information, the entity information, the predictive representation of the dialog generation model can be improved.

Based on the concepts disclosed herein, one of ordinary skill in the art may utilize an entity prediction model to predict entities that may be included or involved in a reply in any manner different from the specific manner above.

Reply generation

One or more replies to the conversation in the conversation front may be generated as candidate replies using a reply generation operation, such as reply generation 310 shown in fig. 3.

In a preferred embodiment of the present description, a dialog generation model may be employed to perform the reply generation operation. The dialog generation model is preferably a GPT model or a variant thereof. The GPT model is a dialog generation model developed by OpenAI corporation, and its variants may include the GPT model, the GPT-2 model, the GPT-3 model, and so on. Other dialog generation models that may be envisioned by those skilled in the art may also be used.

Compared with a retrieval model (described below), the dialogue generation model has the advantages of good flexibility, strong generalization capability and capability of obtaining a reply more relevant to a dialogue front, but has the defect of unstable generation quality. To improve the stability of the model and the quality of the reply generation, we have optimized the generative model (as shown in fig. 8).

Referring to FIG. 8, a diagram of an example model for performing reply generation is shown, in accordance with an embodiment of the present specification. In the example of fig. 8, the reply generation operation is performed using dialog generation model 800. In the example of FIG. 8, the dialog generation model 800 is a GPT model.

As shown in FIG. 8, the conversation generation model 800 receives as input a conversation precursor 802 and outputs generated candidate replies 804. As shown in FIG. 8, the dialog generation model 800 may include a Transformer encoder 806 and a Transformer decoder 808. In particular, the Transformer decoder 808 may generate a next word based on the input from the Transformer encoder 806 and the generated word. For example, in the first position, the inputs to the transform decoder 808 are the output of the transform encoder 806 and the default starting input [ S ], and the first word "pull" is output; at the second position, the input to the transform decoder 808 becomes the output of the previous position (i.e., the first word "pull"), and the second word "pulled" … … is output and so on, ultimately producing a complete output 804, which is a sentence made up of the words that were previously output in turn. Those skilled in the art understand how to implement dialog generation model 800, and the details of its operation are not described herein.

Preferably, the preprocessed dialog preamble is used as input. This pre-processing can be seen in the description of pre-processing 304 above.

Preferably, the dialog context is processed based on the persona information as input to the dialog generation model. That is, the dialog preamble with the character information is taken as input. For example, the role information may be role information determined by the operation of role determination 306, as described in detail above with reference to fig. 6.

In some examples, the role information may be used to determine whether two consecutive sessions were issued by the same user, and thus may be used to merge the sessions. For example, in the example of FIG. 9, the session "2 to 3 Hula!in the raw data 902! "and" diarrhea feels a good belly pain after each diarrhea. "are confirmed as both being issued by the patient so that the two sessions are merged in input 904.

In other examples, the role information is not only used to merge sessions, but may further be included as specific role information (e.g., "doctor", "patient") in the input, for use by the dialog generation model as additional information. For example, with the help of the role information "doctor", the dialog generation model may generate a reply that more closely conforms to the role of doctor.

Preferably, entity information is used as input to the dialog generation model together with the dialog preamble. Entity information may be generated, for example, by an entity prediction model, as described above for entity prediction 306.

In some examples, entity information may be added directly to the conversation preamble as input to the conversation generation model. For example, in the example of fig. 9, the predicted entity information 912 "cause enteroscopy" is added after the session preamble 910 (which may be separated by a separator [ SEP ]).

In other examples, other ways may be used to take entity information as input to the dialog generation model. For example, a dialog preamble-entity information pair may be constructed as input to the dialog generation model.

As described above, the machine learning model may be trained using the dialog preambles as training samples and the corresponding actual replies as labels.

Preferably, the dialog context including the character information may be preferentially selected as a training sample, so that the machine learning model can use the character information for generating the reply by training.

Preferably, replies including entity information may be preferentially placed into the training set. It can be appreciated that for supervised learning, a training set can include a plurality of training examples, each of which can include a training example and a corresponding label. In the embodiment of the present specification, when training the dialog generation model, the dialog preamble is used as a training sample, and the actual reply is used as a label. At this point, those training examples that include entity information in the actual reply may be preferentially selected to train the dialog generation model. In this way, the machine learning model can be enabled to use entity information to generate a reply.

One or more replies generated by the dialog generation model are selected as candidate replies for final output to the user.

Based on the concepts disclosed herein, one of ordinary skill in the art may generate a reply using any dialog generation model in any manner other than the specific manner described above.

One way to obtain candidate replies is presented above, namely to use a dialog generation model to generate replies as candidate replies. A search model may also be employed to search for replies as candidate replies, as discussed below.

Reply retrieval

In a preferred embodiment of the present description, a retrieval model may be employed to perform the reply retrieval operation. Any dialogue retrieval model conceivable by those skilled in the art may be employed to perform the reply retrieval. Preferably, the retrieval may be performed using any vector retrieval model. More preferably, a FAISS model can be employed to perform the reply retrieval. The FAISS is an abbreviation of Facebook AI Similarity Search (Facebook AI Similarity Search), and is a vector retrieval method developed by Facebook companies.

In a preferred embodiment, a vector search model may be employed to perform a search of a knowledge base, which may include a plurality of entries, each of which may be a query-reply pair or a dialogue-preceding-reply pair. Preferably, the knowledge base is a domain-specific knowledge base. For example, in the context of a smart doctor application, the knowledge base may include query-reply pairs or dialogue-preamble-reply pairs relating to medical treatment.

Preferably, to facilitate searching the knowledge base using the vector search model, the entries in the knowledge base may first be vectorized. That is, vectorization may be performed on a dialog-reply pair in the knowledge base, or on a query-reply pair.

In some examples, only the conversational preambles or queries may be vectorized, and no vectorization may be performed on the corresponding replies.

For example, a Word2Vec model may be employed to train a low-dimensional dense vector representation for each Word, while all dialogues or queries may be represented as sentence vectors by a mean Word vector model (WAM).

Referring to fig. 10, an example of a process that may be used to vectorize a dialog context or query is shown.

As shown in fig. 10, the average word vector model may use a word sequence as input 1002. In the examples of the present application, the word sequence is a dialogue preamble or a query.

The input Word sequence is then character embedded 1004, which may be trained, for example, in Word2Vec, resulting in a low-dimensional dense vector representation 1006 for each Word.

Averaging 1008 may then be performed on the low-dimensional dense vector representation of each word in the sequence of words to obtain a vector representation of the input sequence of words (i.e., sentence) as output 1010. Such a model may be referred to as a mean word vector model (WAM). In this way, the dialogues or queries in the knowledge base may be vectorized.

It will be appreciated that the reply may be vectorized in the same manner. Thus, the dialog front-reply pair or the query-reply pair can be vectorized.

Referring to FIG. 11, an example of a process for retrieving candidate replies using the FAISS model according to embodiments of the present specification is shown.

As shown in fig. 11, a "dialog front" or "query" in a domain-specific knowledge base is vectorized using an average word vector model, such that entries in the knowledge base include quantized entries, i.e., quantized "dialog front" and corresponding "replies" (with or without vectorization).

The left half of fig. 11 shows the vectorization process for the knowledge base, while the right half of fig. 11 shows the vectorization process for the query (i.e., the dialogue preamble or question). Through a vectorization process on a knowledge base, a vector index may be generated for the knowledge base so that a search may be performed on the knowledge base using a query vector.

Specifically, in some examples, mining operations may first be performed on knowledge base 1106. Preferably, the knowledge base may comprise a plurality of entries, each entry may be a dialog. Preferably, the dialog may include a dialog preamble/query and a reply to the dialog preamble/query. Preferably, the knowledge base may be a domain-specific knowledge base. For example, for a smart doctor application, the knowledge base may be WebDG. WebDG is a medical dialogue data set with entity labels. Other knowledge bases are also selectable as will occur to those of skill in the art. Performing a mining operation on knowledge base 1106 may select an available or appropriate dialog in the knowledge base.

In other examples, vectorization may be performed on each entry in the knowledge base 1106 without performing mining operations.

Each Word in the dialogies or queries in the knowledge base may then be vectorized by the Word2Vec model 1108 using the process described above with reference to fig. 10 to obtain a low-dimensional dense vector representation of each Word. The Word2Vec model may be trained, for example, using dialogs in the knowledge base 116. The resulting low-dimensional dense vector representation may then be processed using the average word vector model (WAM) 1110 to obtain a sentence vector for each sentence of the mined conversation in the knowledge base (e.g., each conversation or query in the conversation front). Through the above process, a vectorized knowledge base 1112 is finally obtained, which may include vector indexes.

Upon receiving the dialog preamble or query for which a reply is to be generated, the dialog preamble/query 1114 may be used as a query, which is vectorized using the process shown on the right side of fig. 11.

In particular, the same vectorization scheme may be used to vectorize the session preamble or query. Mining of entries in the knowledge base need not be performed. For example, the dialog preamble or query 1104 may first be processed using the Word2Vec model 1116. In general, the Word2Vec model 1116 should be the same as the Word2Vec model 1108 trained in vectorizing the knowledge base 1106. That is, when vectorizing a query (i.e., the dialog predecessor/query 1114), the Word2Vec model used when vectorizing the knowledge base 1106 should be used. The vectors for each word are then averaged using the WAM model 1118 to obtain a query vector 1120, the query vector 1120 representing the vector of entered dialogies/queries 1114.

Subsequently, a vector retrieval model can be used to retrieve the query vector 1120 in the vectorized knowledge base 1112. Preferably, the retrieval may be performed using a FAISS model.

The specific operation of the FAISS model is known to those skilled in the art and therefore the foregoing may not contain all of the details, and those skilled in the art will still know how to retrieve a reply using the FAISS model based on the dialog preamble and/or the query.

By retrieving, one or more replies that match or closely match the conversation preamble or query may be determined as candidate replies. For example, replies with a degree of match greater than a threshold degree of match may be selected as candidate replies. Alternatively, a specified number of replies with the highest degree of match may be selected as candidate replies.

Based on the concepts disclosed herein, one of ordinary skill in the art may utilize the retrieval model to retrieve replies in the knowledge base in any manner different from the specific manner described above.

The details of obtaining one or more candidate replies through the dialog generation model and the retrieval model, respectively, are described above. In the following, sometimes in order to distinguish between candidate replies obtained by two different models, the candidate reply generated by the dialogue generation model is referred to as a first candidate reply and the reply obtained by the retrieval model is referred to as a second candidate reply. The one or more first candidate replies and the one or more second candidate replies together comprise a set of candidate replies.

In most cases, only one optimal reply is typically output to the user. Thus, in embodiments of the present specification, responses in the set of candidate responses may be scored through a scoring model, such that the highest scoring candidate response is selected for output to the user as the final response.

In some examples, a first best candidate reply is generated by the dialog generation model and a second best candidate reply is obtained by the retrieval model. Thus, the resulting set of candidate replies includes only two replies, namely the first best candidate reply and the second best candidate reply. Then only the first best candidate reply and the second best candidate reply need to be scored, the higher scoring one being output to the user as the final reply.

Return scoring

Referring to FIG. 12, a schematic diagram of a scoring model for scoring replies according to an embodiment of the present specification is shown.

As shown in FIG. 12, the scoring model may be implemented using a BERT model, which may take the dialog context + the reply to be scored as input 1202, and use the classification of the reply ("match" or "no match") as output. In particular, a query (Q) -reply (a) pair for each session in the dialog context may be taken as input. For example, the dialog context may include multiple turns of a conversation and a query to reply. The multiple rounds of conversations may be processed as multiple challenge-response pairs, with the last challenge to be replied to and the reply to be scored together constituting the last response pair. As in the example of FIG. 12, Qm is the query in the dialog front to reply, and Am is the reply (e.g., candidate reply) to which a score is to be scored, while m-1 queries prior to Qm are the queries for m-1 sessions in the dialog front, and m-1 replies prior to Am are the corresponding replies in m-1 sessions in the dialog front.

The input 1202 is processed using the BERT model and through associated other processing, such as CLS symbolic processing 1206 and Softmax 1208, resulting in an output 1210 that indicates that the reply matches the dialog preamble and a degree of match value (e.g., a value between 0 and 1) (and/or a mismatch and a degree of mismatch value). In the case of outputting the value of the degree of mismatch, the value of the degree of match may be considered to be (1-degree of match). The matching degree value is the score of the reply.

Based on the concepts disclosed herein, one of ordinary skill in the art may utilize the BERT model or variations thereof to implement the reply scoring operations in any manner different from the specific manner above.

The BERT model in fig. 12 is substantially the same as the models of fig. 6 and 7, except for the number of labels and classifications.

For example, for the role determination model of FIG. 6, a BERT model that performs a single label two classification task may be employed, where a single label is a role and two are classified as patient and non-patient (e.g., a smart doctor). In examples where more than two roles are included in the session preamble (e.g., including patient, patient assistant (e.g., doctor who helps patient express symptoms), and smart doctor), the role determination model may also employ a single-label multi-classification model where a single label is a role and multiple classifications may include the multiple roles.

For the entity prediction model of fig. 7, a BERT model (and thus which may include Softmax) may be employed that performs a multi-label multi-classification task, where each label of the plurality of labels may be a different entity class, e.g., for a smart physician application, the label may be one or more of a symptom, an incentive, a drug, a treatment regimen of the patient, while the plurality of classifications are specific entities in that class, e.g., the drug is aspirin, metoprolol tablets, and so forth.

For the scoring model of FIG. 12, a single label binary task may be employed. For example, the single tag may be whether the generated reply matches the conversation preamble, with the two classifications being "yes" and "no," respectively.

In particular, to implement a BERT model that performs different types of tasks, an appropriate penalty function may be selected. For example, the Softmax function is particularly suitable for multi-classification tasks, while either the Sigmoid function or the Softmax function is suitable for bi-classification tasks; the Sigmoid function is particularly suitable for multi-tag tasks and the like, and can be selected by a person skilled in the art according to actual needs or according to experiments. In addition, different classification layer neuron numbers can also be implemented for different types of tasks. Those skilled in the art know how the BERT model for performing different types of tasks should be implemented and will not be described in detail herein.

For the scoring model to work well, it is critical that a good training set be constructed for training it. The training samples in the training set are pairs of dialogue preambles and replies (or pairs of inquiry replies) and the labels corresponding to the training samples are 1 or 0, for example, a 1 may indicate that the reply in the pair of dialogue preambles and replies is the actual reply of the dialogue preamble, and a 0 may indicate that the reply in the pair of dialogue preambles and replies is not the actual reply of the dialogue preamble.

It will be appreciated that positive examples may come from an actual conversation, while negative examples may be artificially constructed or selected. The quality of the artificially constructed negative examples is very important to the performance of the trained scoring model. In one example, the reply may be a reply constructed by a machine learning model (e.g., a GPT model or a variant thereof) that is determined not to match the dialog preamble. In another example, the reply may be a reply selected from other dialog sets in the knowledge base. It will be appreciated that a large number of negative examples may be constructed or selected by a combination of dialog preambles and mismatch replies.

In order to promote the performance of the scoring model, it is preferable in the embodiment of the present specification to preferentially construct or select a reply different from the entity included in the reply in the actual dialog as a negative example. For example, assuming that the entity included in the positive sample is "enteroscopy" (see, e.g., the example described above with reference to fig. 9), then in constructing or selecting the negative sample, a reply is preferably selected that includes an entity other than "enteroscopy"; it may also be preferred not to include replies from any entities.

It can be appreciated that even with filtering using the criteria "different entities", a large number of negative examples can still be constructed or selected. Therefore, if all negative samples constructed or selected are placed in the sample set with relatively fewer positive samples, the number of positive samples in the training set is significantly less than the number of negative samples, which is not conducive to more quickly training a high quality machine learning model. Therefore, random sampling is performed on the constructed or selected negative samples, and the negative samples are placed in the training set.

Preferably, the number of positive and negative examples in the training set is comparable. Preferably, the ratio of the number of negative samples to the number of positive samples in the training set is not greater than 10:1 and not less than 1: 10. More preferably, the ratio of the number of negative samples to the number of positive samples in the training set is not more than 3:1 and not less than 1: 3. More preferably, the ratio of the number of negative and positive samples in the training set is about 1: 1.

Training of the scoring model is described above, and how the scoring model is applied to score the candidate replies and select the final reply is described below.

When a scoring model is applied for scoring (prediction), the inputted conversation preamble (or query) and each candidate reply in the candidate reply set form a conversation preamble-reply pair (or query-reply pair) and are inputted into the scoring model. The scoring model processes the input so as to obtain a judgment result of whether the reply is matched with the conversation preamble or not and corresponding matching degree and mismatching degree scores.

The highest scoring reply may then be selected as the final reply to the query in the dialog context for output to the user.

Automatic question answering method

Referring to FIG. 13, a schematic flow chart diagram of an example computer-implemented automated question-answering method 1300 in accordance with an embodiment of the present specification is shown. The details of the individual operations of this method have been described in detail in the foregoing.

As shown in fig. 13, method 1300 may include: at operation 1302, a conversation preamble can be obtained that includes a query to reply to.

In some examples, obtaining the conversation preamble may include, for example, receiving a complete conversation full text. All sessions in the conversation preamble may be obtained from web pages, desktop clients, smart phone applications, applets, SMS messages, video phones, etc. The dialog preambles may also be received from other databases.

In other examples, only the portion of the dialog preamble issued by the user may be received. For example, when multiple conversations are included in the conversation preamble, some conversations may be initiated by the user, while other conversations may be automatically generated by the system as described in embodiments herein. Thus, only sessions initiated by the user may be received, and sessions automatically generated by the system may be retrieved from the system's storage.

One or more sessions from a user may be received by the user's client in various ways. For example, a session from a user may be received by way of a web page, desktop client, smartphone application, applet, SMS message, video phone, etc. One or more conversations from the user may also be received by way of the telephone voice.

Specific details regarding the foregoing of the dialog may be found in reference to the description above, such as the contents of the "preprocessing" section above, and particularly the descriptions with respect to fig. 4-5.

Preferably, after receiving the session preamble, a pre-processing may be performed on the session preamble. Preferably, if the received dialog context is non-text content, such as audio or video content or pictures, the operation of converting the non-text content into text content may be further included before performing the preprocessing. Such a conversion operation may be performed in any manner known to those skilled in the art. Performing pre-processing may also include performing operations such as performing a data wash. Sample processing and selection may also preferably be performed on the dialog context. For example, when constructing a training set for training a dialog generation model or an entity determination model, a reply including entity information may be preferentially selected for placement in the training set.

Specific details regarding the pre-processing may be found in the description above, such as the description with respect to fig. 4-5.

The method 1300 may include: at operation 1304, one or more first candidate replies to the query may be retrieved in a knowledge-base of questions and answers using a retrieval model based on the conversation preambles. Preferably, the operations may include converting entries and queries in a knowledge-base of questions and answers to vector representations and using a vector retrieval method to retrieve a query vector (or query vector) in a vectorized knowledge-base of questions and answers. Preferably, the vector retrieval method may be implemented using a FAISS.

Specific details regarding this operation may be found in reference to the description above, such as the contents of the "reply to retrieve" section above, and particularly the contents described with respect to fig. 10-11.

The method 1300 may include: at operation 1306, one or more second candidate replies to the query may be automatically generated using a dialogue generation model based on the dialogue preamble, the one or more first candidate replies and the one or more second candidate replies comprising a set of candidate replies.

Preferably, role information associated with the dialog preamble, such as described above with reference to fig. 6, may be determined. The conversation preamble can then be processed based on the role information as input to the conversation generation model.

Preferably, entity information associated with the reply may be predicted, such as described above with reference to fig. 7. The predicted entity information may then be used as input to the conversation generation model along with the conversation preamble.

As described above, replies including entity information are preferably selected to be placed into a training set when training the dialog generation model.

For specific details regarding the reply generation operation, reference may be made to the description above, such as the contents of the "reply generation" section above, and particularly the contents described with respect to fig. 8-9.

The method 1300 may include: at operation 1308, candidate replies in the set of candidate replies may be scored using a scoring model. As described above, the scoring model is trained based on a set of dialog samples (i.e., the training set described above with reference to fig. 12), each dialog sample including a dialog preamble-reply pair, the dialog samples being divided into positive samples and negative samples, wherein a reply in a positive sample matches a dialog preamble and a reply in a negative sample does not match a dialog preamble. Preferably, for the same conversation preamble, the entity associated with the reply in the negative sample is different from the entity associated with the reply in the positive sample.

Specific details regarding this operation may be found in reference to the description above, such as the contents of the "reply scoring" section above, particularly that described with respect to FIG. 12.

The method 1300 may include: at operation 1310, a reply to the query may be selected from the set of candidate replies based on the results of scoring the candidate replies in the set of candidate replies. For example, the highest scoring candidate reply may be selected as the reply to the query.

This operation may also include outputting a reply to the query to the user. For example, it may be output to a user's client by way of a web page, desktop client, smartphone application, applet, SMS message, video phone, etc. Alternatively, the output may be to the user's telephone by way of telephone speech.

Preferably, the query and the reply to the query may be saved to the question-and-answer repository. In this way, the question-answer repository can be continuously enriched so that saved replies can be retrieved directly from the question-answer repository when subsequent identical or similar queries are encountered.

It may be appreciated that in some examples, the queries (or conversation preambles) in the question-and-answer knowledge base may be specifically matched to the obtained queries, in which case the replies retrieved from the question-and-answer knowledge base may be selected directly without using the conversation generation model to generate candidate replies.

Referring to FIG. 14, a schematic flow chart diagram of another example automatic question-answering method 1400 in accordance with an embodiment of the present specification is shown.

As shown in fig. 14, the method 1400 may include: at operation 1402, a conversation preamble can be obtained that includes a query to reply to. Specific details of this operation are described above with respect to operation 1302.

The method 1300 may include: at operation 1304, one or more first candidate replies to the query may be retrieved in a knowledge-base of questions and answers using a retrieval model based on the conversation preambles. Specific details of this operation are described above for operation 1304.

The method 1400 may include: at operation 1406, a degree of match of the one or more retrieved first candidate replies to the query may be determined.

The method 1400 may include: at operation 1408, a first candidate reply with a highest degree of match of the one or more first candidate replies may be determined.

The method 1400 may include: at operation 1410, it may be determined whether the highest degree of match is greater than a threshold degree of match.

The method 1400 may include: if the highest degree of match is greater than the threshold degree of match, the first candidate reply with the highest degree of match may be directly considered as a reply to the query at operation 1412. At this time, the reply generation operation may not be performed.

The method 1400 may include: if the highest degree of match is greater than the degree of match threshold, one or more second candidate replies to the query may be automatically generated using a conversation generation model based on the conversation preamble, the one or more first candidate replies and the one or more second candidate replies comprising a set of candidate replies, at operation 1414. Specific details of this operation are described above for operation 1306.

The method 1400 may include: at operation 1416, candidate replies in the set of candidate replies may be scored using a scoring model. Specific details of this operation are described above for operation 1308.

The method 1400 may include: at operation 1418, a reply to the query may be selected from the set of candidate replies based on the results of scoring of candidate replies in the set of candidate replies. Specific details of this operation are described above with respect to operation 1310.

Automatic question-answering system

Referring to FIG. 15, a schematic diagram of an example automatic question-answering system 1500 in accordance with an embodiment of the present description is shown.

As shown in fig. 15, the automated question-answering system 1500 may include a retrieval model 1502, a dialog generation model 1504, a scoring model 1506, and a reply module 1508. The specific details of each model or module may be found in the description of the relevant operation above.

The retrieval model 1502 may be used to retrieve one or more first candidate replies to queries included in a conversation preamble in a question-and-answer knowledge base based on the conversation preamble, as described above with reference to the reply retrieval operation.

Dialog generation model 1504 may be used to automatically generate one or more second candidate replies to the query based on the preceding part of the dialog, the one or more first candidate replies and the one or more second candidate replies constituting a set of candidate replies, as described above with reference to the reply generation operation.

Scoring model 1506 may be used to score candidate replies in the set of candidate replies, as described above with reference to the reply scoring operation.

Reply module 1508 may be operative to select a reply to the query from the set of candidate replies based on a result of scoring of a candidate reply in the set of candidate replies, as described above with reference to operations 1310 or 1418.

Optionally, the automated question-answering system 1500 may further include a pre-processing module 1510 operable to perform pre-processing on the conversation preambles, wherein the pre-processing module is operable to preferentially select sentences containing entity information to be placed into a training set when training the conversation generation model, as described above with reference to the pre-processing operation.

Optionally, automated question-answering system 1500 may also include a role determination model 1512 that may be used to determine role information associated with the conversation preambles, where the conversation preambles are processed based on the role information as input to the conversation generation model, as described above with reference to role determination operations.

Optionally, the automated question-and-answer system 1500 may also include an entity prediction model 1514 that may be used to predict entity information associated with the reply, where the predicted entity information is used as input to the conversation generation model along with the conversation preamble, as described above with reference to entity prediction operations.

In a preferred example, the retrieval model may be further operable to: determining a degree of match of the retrieved one or more first candidate replies to the query; determining a first candidate reply with a highest degree of match among the one or more first candidate replies; and if the highest matching degree is larger than a threshold matching degree, directly using the first candidate reply with the highest matching degree as the reply to the inquiry.

Optionally, automated question and answer system 1500 may also include a save module 1516 operable to save the query and the reply to the query to the question and answer repository.

Method for automatically replying medical consultation

Based on the automatic question answering method introduced above, the embodiment introduced herein is innovatively improved, so that the method of the embodiment of the present specification is particularly suitable for automatically replying medical consultation, and is particularly suitable for providing remote medical consultation for people in remote mountain villages.

It should be appreciated that details of the various embodiments described above are applicable to the embodiments described below and, therefore, are not described again, unless otherwise indicated.

Referring to fig. 16, a schematic flow diagram of a computer-implemented method 1600 of automatically replying to a medical consultation is shown, in accordance with embodiments of the present description.

The method according to embodiments of the present description may be implemented in various ways.

In a first implementation, the method may be performed by the patient himself. In this case, at least a portion of the session preamble may be issued by the patient.

In a second implementation, the method may be engaged by a physician, at which time aspects of embodiments of the present description may be used to assist the physician in performing a diagnosis. In this case, at least a portion of the session preamble may be issued by the doctor.

In a third implementation, the method may be co-participated by the patient and a patient assistant. The patient aid may be, for example, a doctor. In this case, at least a portion of the session preamble may be issued by the patient and at least another portion of the session preamble may be issued by a patient assistant, such as a doctor. In this case, information for medical consultation, such as symptoms, may be provided by the patient and the doctor together.

In a fourth implementation, the method may be participated in by a non-human character in addition to the participants described above. Such non-human characters may include, for example, health condition collection devices or case systems, as described in more detail below.

In the following description, the first implementation described above is mainly taken as an example, and in this case, the user is usually referred to as a patient. It should be appreciated that similar operations may be equally applicable to other implementations, in which case the user may also refer to a doctor or other patient assistant.

As shown in fig. 16, method 1600 may include: at operation 1602, a medical session preamble may be obtained. The medical session context may include, for example, medical consultation. The medical session context may be related to the health condition of the patient. Specific details of this operation may be found in relation to the description above for operation 1302.

There may be several unique problems with medical advice, especially in remote mountain villages: first, medical consultation may be sent in a relatively urgent situation, and at this time, consultation may not be possible or desired to use new service means such as APP, applet, and web page, but rather, may be preferred to use a conventional calling (i.e., sending a telephone voice) manner; second, for the elderly, those in remote mountain villages, etc., who are less sensitive to new technology, they may not even have a wireless network or internet access device and thus may only use a telephone to make a consultation, or prefer to use a telephone to make a consultation.

For such a situation, in a preferred embodiment of the present specification, obtaining the medical session preamble may include receiving a telephone voice from the user. Accordingly, in subsequent operations (e.g., in a pre-processing stage), the telephone speech may be converted to text. Further, when a reply to the medical consultation is determined and the reply is output to the user, the reply may be converted into a phone voice and the phone voice may be output to the user's phone (e.g., played to the user during a phone call in which the medical consultation is made).

By the method, the coverage rate of the medical consultation service can be greatly improved, and the method is particularly beneficial to providing automatic medical consultation service for people in remote mountain villages or the old.

There may be a problem in that it is accented to people or old people in remote mountain villages. For example, people or elderly in remote mountain villages may be more accustomed to using dialects than to using mandarin or not.

If speech is included in the received dialog preamble and the speech is dialect (e.g., in the previous example, the telephone speech was dialect), then in a preferred embodiment of the present specification, the dialect may also be converted to mandarin text. Accordingly, when a reply is output to the user, the reply may be converted into a dialect, so that a user who is not accustomed to or does not understand mandarin can understand the reply.

In this way, the coverage rate of the medical consultation service is further improved, and the method is particularly friendly to dialect-only users.

In a preferred embodiment of the present specification, the instructions for acquiring health condition data of the patient may be transmitted to the health condition acquisition device based on the session preamble.

In an implementation in which the method of the present invention directly services a patient (as in the first implementation above), the user of an embodiment of the present invention is the patient. That is, at least a portion of the session context is initiated by the patient. In this case, the health data of the patient may be acquired using a health acquisition device at the patient.

For example, it may first be determined whether a health-condition collection device is present at the user. Examples of health-condition collection devices include, but are not limited to: a sphygmomanometer, a blood glucose meter, a thermometer, a camera, etc. With the health data collecting apparatus, it is possible to collect health condition data of the user, such as a blood pressure value, a blood glucose value, a body temperature value, an image of the whole of the user or a part thereof (such as a diseased site), and the like. In a preferred embodiment of the present description, these health data collection devices may be networked such that a system as described in embodiments of the present description is capable of sending instructions over a network to the health data collection devices to instruct their operation and receive the data they collect. In a preferred embodiment, a health condition collection kit, which may include one or more health condition collection devices, may be sent to a user who subscribes to a medical advice service provided according to embodiments of the present specification.

Before collecting the health condition data, it is preferable to determine whether or not it is necessary to collect the health condition data of the user and what kind of health condition data is collected. This determination may be performed automatically, for example, by a system as described herein, such as a trainable machine learning model to discover associations between a symptom described by a user in the context of a conversation and health data that may assist in interpreting the symptom, such that whether and what health data to collect can be determined based on the context of the conversation.

Prior to collecting the health data, patient consent is preferably obtained. For example, a request to collect health condition data of a patient may be displayed or played to the patient at the time the patient registers for use of the medical advice service or before the health condition data is collected, and the health condition data of the user may be collected if the user approves the request by text or voice or a gesture (e.g., a nodding gesture).

After the health data is collected, the health data from the health collection device may be received. The health data may then be converted to text as part of the dialog preamble. For example, the blood pressure value, blood glucose value, body temperature value, etc. of the user can be directly converted into the session preamble.

Where the acquired data comprises image data, image processing may be performed on the image to convert it to a part of the dialog preamble. For example, when an image of a diseased site of a user is taken, the condition of a wound or suppuration in the image may be image-recognized (e.g., using a machine learning model) to generate text, and the text may be added to the dialog context. The text may describe the result of the image recognition, for example, the text may include "wound present", "eye redness", or "elbow suppuration", among others.

By providing the health data collected by the health collection data, the replies generated as embodiments of the present description can be made more accurate.

In an example where the method of the present invention is used to assist a physician in performing a diagnosis, the patient may be located at and be performing a diagnosis by the physician. At this point, the participants in the preceding session may include a doctor (and may also include both a doctor and a patient). At this time, the required health condition data acquisition instruction may be transmitted to the medical health condition data acquisition apparatus at the doctor. The health condition acquisition device at the doctor's site may refer to any device for acquiring health condition data of a patient, including large medical instruments, such as a nuclear magnetic resonance device, and the like. Preferably, these devices are networked with a system as in embodiments of the invention so that the system of embodiments of the invention can send data collection instructions to it.

Likewise, prior to collecting health data, authorization of the patient may first be obtained. In some embodiments, the authorization of the doctor may additionally or alternatively be obtained first.

In some preferred embodiments, at least a portion of the patient's medical profile may be automatically obtained from the medical facility based on the session preamble. The medical record may include, for example, medical record data as well as previously acquired data indicative of the health condition of the patient, such as previously taken X-ray images, magnetic resonance images, electrocardiograms, blood test results, and the like.

Before acquiring the medical file of the patient, it is preferable to determine whether the medical file of the patient is necessary to be acquired and what kind of medical file data is to be acquired. This determination may be performed automatically, for example, by a system as described herein, such as a trainable machine learning model to discover associations between a symptom described by a user in the context of a session and medical profile data that may assist in interpreting the symptom, thereby enabling a determination to be made as to whether it is necessary to obtain a patient's medical profile and what medical profile data to acquire based on the session context. Subsequently, a search may be performed in the medical profile to obtain all or a portion of the medical profile.

Preferably, the patient's authorization may be obtained first before at least a portion of the patient's medical profile is obtained. Additionally or alternatively, authorization of a doctor or medical archive authority may be obtained.

At least a portion of the obtained medical profile may then be converted to text as part of the session context. For example, image processing or the like may be performed on images in the medical archive to obtain text associated with the images. For example, text recognition may be performed using techniques such as Optical Character Recognition (OCR), or image recognition may be performed using a machine learning model (as described above) to convert at least a portion of the obtained medical profile into text and. The details are not repeated herein.

The method 1600 may include: at operation 1604, one or more first candidate replies to the medical consultation may be retrieved in a knowledge base of medical questions and answers using a retrieval model based on the medical conversation predecessors. Specific details of this operation may be referenced above with respect to the description of operation 1304.

The method 1600 may include: at operation 1606, one or more second candidate replies to the medical consultation may be automatically generated using a dialogue generation model based on the medical dialogue preamble, the one or more first candidate replies and the one or more second candidate replies constituting a set of candidate replies. The specific details of this operation may be referenced above with respect to the description of operation 1306.

Preferably, role information associated with the medical session context may be determined. The role information indicates the role of the originator of the session in the dialog context. In a preferred embodiment of the present description, the role information indicates whether the sender of the session in the dialog context is a patient or a doctor.

In the first implementation described above, the dialog context includes two roles, respectively the patient himself and the smart doctor. In this case, the role information may indicate whether the sender of the session in the session context is a patient or a doctor.

In the second implementation described above, the dialog context includes two roles, a human doctor and a smart doctor, respectively, that perform a diagnosis for the patient. For example, a system as an embodiment of the present description may be used by a human physician to assist in diagnosis. In this case, the human doctor can input the content of the symptoms thereof instead of the user. At this point, the human physician may be considered or representative of the patient.

In the third implementation described above, the dialog context may include more roles, which may include, for example, a patient assistant (e.g., a human doctor), and a smart doctor. The patient assistant may assist the patient in supplementing the relevant information. For example, the patient helper may be a parent of the patient. The patient assistant may also be a doctor, who helps the patient to supplement the relevant information; for example, in the case of remote mountain village medical care, the patient assistant may be a rural doctor. Rural doctors are often able to more clearly describe the relevant symptoms than rural people, especially remote mountain-based people. By introducing a rural doctor in the front of the conversation, who provides the consultation, the symptom description can be better provided to obtain a higher quality of the reply. Thus, embodiments of the present invention can provide higher quality replies by appearing in the context of a conversation by a rural doctor (e.g., interlocutor including a rural doctor and a smart doctor (which may be implemented by the system of the present invention)) instead of or in addition to a patient (e.g., interlocutor may include a patient, a rural doctor and a smart doctor, three parties).

In the fourth implementation described above, the dialog preamble may include other non-human characters. For example, some information in the dialog context may be obtained or collected by a system as described in embodiments herein. For example, in the case of acquiring user health data acquired by a health acquisition device, the health acquisition device may also act as a participant in a conversation. For another example, where previous cases of the user are obtained, the previous case system may also act as a participant in the conversation.

By performing more and more detailed role classification, the solution of the embodiments of the present specification can more accurately understand the information in the preceding part of the conversation, thereby improving the quality of the output reply.

Preferably, entity information associated with the reply may be predicted, such as described above with reference to fig. 7. The predicted entity information may then be used as input to the conversation generation model along with the conversation preamble. For medical consultation, in a preferred embodiment of the present specification, the entity information may relate to one or more of a user's symptoms, causes, drugs, treatment regimens, for example. By determining symptoms, causes, drugs, or treatment regimens that may be involved in the reply, embodiments of the present disclosure may be more targeted in generating the reply, thereby enhancing the quality of the reply for medical advice.

As described above, a reply including entity information is preferably selected to be placed into a training set when training an entity prediction model or a dialog generation model. For example, one or more of symptoms, causes, medications, and treatment regimens that include the user may be preferentially selected for placement in the training set.

In a preferred embodiment, the dialog generation model is trained using a dialog library associated with a particular locale. It will be appreciated that people in different regions may express different habits. Such differences do not or only do not mean that a dialect of a particular geographic area is used, but rather that the manner in which people in different geographic areas express the same content may be different. For example, patients in certain areas of the north may use the following expression: "yesterday's first belly pain", while patients in certain regions of south may express the same meaning with the expression: "the first belly pain yesterday". Thus, training the conversation generation model using a conversation library associated with a particular territory may allow for better adaptability of the obtained model to users of the particular territory.

In addition, the probability of possibly suffering from different diseases is different due to differences in living habits and the like of different regions. For example, in areas that are accustomed to eating very spicy foods, there is a relatively high probability of gastrointestinal disease. To reflect this, in a preferred embodiment, the region information can also be used as the entity information as the input of the dialog generation model, so as to generate a more targeted reply.

In the above example, the geographic information may be received from a client device of the user (e.g., a client device having location functionality), and/or input by the user, for example.

The method 1600 may include: at operation 1608, the candidate replies in the set of candidate replies may be scored using a scoring model. Specific details of this operation may be referenced above with respect to the description of operation 1308.

As described above, the scoring model is trained based on a set of medical conversation samples, each conversation sample including a medical conversation precursor-reply pair, the conversation samples divided into positive samples and negative samples, wherein replies in the positive samples match the medical conversation precursor and replies in the negative samples do not match the medical conversation precursor. Preferably, for the same conversation preamble, the entity associated with the reply in the negative sample is different from the entity associated with the reply in the positive sample.

The method 1600 may include: at operation 1610, a reply to the medical advice may be selected from the set of candidate replies based on the scoring results for the candidate replies in the set of candidate replies. Specific details of this operation may be referenced above with respect to the description of operation 1310.

Preferably, the query and the reply to the query may be saved to the knowledge base of questions and answers. In this way, the question-answer repository can be continuously enriched so that saved replies can be retrieved directly from the question-answer repository when subsequent identical or similar queries are encountered.

Preferably, based on the generated reply, the patient may be automatically called for medical emergency services. For example, if the reply includes a suggestion that the user is seeking immediate treatment or other situation requiring emergency services, the patient may be automatically called 120 medical emergency services, such as emergency services. The 120 emergency services may be emergency services for the region in which the patient is located. Preferably, the user's authorization may be obtained first before calling emergency services.

Preferably, based on the generated reply, an appointment registration may be automatically made for the patient at the hospital. For example, if the reply includes the word "enteroscope," the patient may be automatically ordered for an enteroscope service at a hospital and/or ordered for registration at a gastroenterology department. Preferably, the user's authorization may be obtained first before the subscription is registered. Preferably, hospitals and/or departments of the region where the patient is located may be automatically searched (e.g., the search is performed through a map service or a POI service, etc.), and the hospital and/or department to be registered may be selected according to the search result. Alternatively, a list of hospitals and/or departments to be registered may be presented to the patient and selected by the patient.

The operations of calling medical emergency services and making appointment registrations may be determined by one or more modules of the system according to embodiments of the present specification based on rules or based on machine learning models.

Preferably, the method may further comprise storing the reply in a knowledge base of medical questions and answers.

The patient (or patient assistant) may, upon receiving the reply, make further diagnosis or treatment based on the advice in the reply. In some cases, such further diagnosis or treatment may lead to good results, e.g. a clear diagnosis of the disease or good efficacy of the treatment. In a preferred embodiment of the present description, the method may further comprise selectively saving the reply in the knowledge base of medical questions and answers based on feedback on the reply (e.g., the result of further diagnosis or treatment). For example, only responses that are well fed back may be saved in the knowledge base of medical questions and answers.

By storing the feedback in the medical question-answer knowledge base, the content of the medical question-answer knowledge base can be expanded, and the quality of the medical question-answer knowledge base can be improved.

In a preferred embodiment, a migration learning model can also be utilized to perform migration learning based on entries in the knowledge base of medical questions and answers to adapt to other regions or other scenarios. For example, after a knowledge base of medical questions and answers is obtained mainly from non-remote mountain villages such as cities, the knowledge base can be migrated to the scenes of the remote mountain villages. By the method, the problems of few remote village conversation samples and cold start can be solved. Alternatively, samples in one zone may be migrated to another zone, thereby solving the problem of insufficient samples in some zones.

By combining the preservation of the reply with good feedback and the transfer learning, the application range of the embodiment of the present specification can be greatly expanded, especially in the case of a small number of samples in remote villages and the like.

It may be appreciated that in some examples, the queries (or conversation preambles) in the question-and-answer knowledge base may be specifically matched to the obtained queries, in which case the replies retrieved from the question-and-answer knowledge base may be selected directly without using the conversation generation model to generate candidate replies. Specific examples can refer to the description above with reference to fig. 14, and detailed description thereof is omitted.

System for automatically replying medical consultation

Referring to fig. 17, a schematic diagram of an example system 1700 for replying to medical advice in accordance with an embodiment of the present description is shown.

As shown in FIG. 17, the system 1700 may include a retrieval model 1702, a dialog generation model 1704, a scoring model 1706, and a reply module 1708. The specific details of each model or module may be found in the description of the relevant operation above.

The retrieval model 1702 may be used to retrieve, in a medical question and answer knowledge base, one or more first candidate replies to medical advice included in the medical session preamble based on the medical session preamble, as described above with reference to reply retrieval operations and operations 1604.

The dialog generation model 1704 may be used to automatically generate one or more second candidate replies to the medical consultation based on the medical dialog preamble, the one or more first candidate replies and the one or more second candidate replies constituting a set of candidate replies, as described above with reference to reply generation operations and operations 1606. Preferably, as described above, the dialog generation model may be trained using a dialog library associated with a particular locale.

Scoring model 1706 may be used to score candidate replies in the set of candidate replies, as described above with reference to reply scoring operations and operation 1608.

The reply module 1708 may be used to select a reply to the query from the set of candidate replies based on the results of scoring of a candidate reply in the set of candidate replies, as described above with reference to

operations

1310, 1418, or 1610.

Optionally, the system 1700 may further include a preprocessing module 1710 operable to perform preprocessing on the conversation preambles, wherein the preprocessing module is operable to preferentially select statements containing entity information to place into a training set when training the conversation generation model, as described above with reference to preprocessing operations. Preferably, the pre-processing module is operable to convert a telephone voice received from the user into text, and the reply module is further operable to convert the reply into a dialect form of the telephone voice.

Optionally, system 1700 may further comprise a role determination model 1712 operable to determine role information associated with the conversation preambles, wherein the conversation preambles are processed as input to the conversation generation model based on the role information, as described above with reference to the role determination operations. For example, the role information indicates whether the sender of the session in the dialog front is a patient or a doctor.

Optionally, the system 1700 may further include an entity prediction model 1714 operable to predict entity information associated with the reply, wherein the predicted entity information is used as input to the conversation generation model along with the conversation preamble, as described above with reference to entity prediction operations. Preferably, the entity information relates to one or more of symptoms, causes, drugs, treatment regimens of the patient.

Optionally, system 1700 may further comprise a saving module 1716 operable to save the medical consultation and the reply to the medical consultation to the question-answer repository. Preferably, the saving module may selectively save the reply in the knowledge base of medical questions and answers based on the feedback of the reply. The specific details of the operation of this module may be found in the description above.

Preferably, the system 1700 may further include a medical profile acquisition module (not shown). The medical profile acquisition module may be configured to automatically acquire at least a portion of a medical profile of a patient from a medical facility based on the session preamble, wherein the at least a portion of the medical profile is included as part of the session preamble. The operational details of the module may be found in the description above.

Preferably, the system 1700 may further include a health data collection module (not shown). The health data acquisition module may be operable to: sending an instruction for acquiring health condition data of the patient to a health condition acquisition device based on the session preamble; receiving health data from the health collection device; and converting the health data to text as part of the dialog context. The specific details of the operation of this module may be found in the description above.

Preferably, the system 1700 may also include an emergency services module (not shown) that may be used to automatically call medical emergency services for the patient. The specific details of the operation of this module may be found in the description above.

Preferably, the system 1700 may also include an appointment module (not shown) that may be used to automatically register appointments for the patient. The specific details of the operation of this module may be found in the description above.

Optionally, the system 1700 may also include a migration learning module (not shown). The transfer learning module can execute transfer learning aiming at the medical question-answer knowledge base (or the medical question-answer knowledge base after the reply is saved) so as to be suitable for more regions or scenes.

Fig. 18 shows a schematic block diagram of an apparatus 1800 for implementing a system or method in accordance with one or more embodiments of the present specification. The apparatus may include a processor 1810 configured to perform any of the methods described above, and a memory 1815.

The apparatus 1800 may include a network connection element 1825, which may include, for example, a network connection device connected to other devices through a wired connection or a wireless connection. The wireless connection may be, for example, a WiFi connection, a Bluetooth connection, a 3G/4G/5G network connection, or the like.

The device may also optionally include other peripheral elements 1820 such as input devices (e.g., keyboard, mouse), output devices (e.g., display), and the like. For example, in a method based on user input, a user may perform an input operation via an input device. Corresponding information may also be output to the user via the output device.

Each of these modules may communicate with each other directly or indirectly, e.g., via one or more buses, such as bus 1805.

Also, disclosed herein is a computer-readable storage medium comprising computer-executable instructions stored thereon that, when executed by a processor, cause the processor to perform the methods of the embodiments described herein.

Additionally, an apparatus is disclosed that includes a processor and a memory having stored thereon computer-executable instructions that, when executed by the processor, cause the processor to perform the method of the embodiments described herein.

Additionally, a system comprising means for implementing the methods of the embodiments described herein is also disclosed.

It is to be understood that methods according to one or more embodiments of the present description can be implemented in software, firmware, or a combination thereof.

It should be understood that the embodiments in the present specification are described in a progressive manner, and the same or similar parts in the embodiments are referred to each other, and each embodiment is described with emphasis on the differences from the other embodiments. In particular, as to the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple and reference may be made to some descriptions of the method embodiments for related points.

It should be understood that the above description describes particular embodiments of the present specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

It should be understood that an element described herein in the singular or shown in the figures only represents that the element is limited in number to one. Furthermore, modules or elements described or illustrated herein as separate may be combined into a single module or element, and modules or elements described or illustrated herein as single may be split into multiple modules or elements.

It is also to be understood that the terms and expressions employed herein are used as terms of description and not of limitation, and that the embodiment or embodiments of the specification are not limited to those terms and expressions. The use of such terms and expressions is not intended to exclude any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications may be made within the scope of the claims. Other modifications, variations, and alternatives are also possible. Accordingly, the claims should be looked to in order to cover all such equivalents.

Also, it should be noted that while the present invention has been described with reference to specific examples, it should be understood by those skilled in the art that the above embodiments are merely illustrative of one or more embodiments of the present invention, and that various changes and substitutions of equivalents may be made without departing from the spirit of the invention, and therefore, it is intended that all changes and modifications to the above embodiments be included within the scope of the appended claims.

Claims

1. A computer-implemented method of automatically replying to medical advice, comprising:

acquiring a medical conversation preamble, wherein the medical conversation preamble comprises medical consultation;

retrieving, using a retrieval model, one or more first candidate replies to the medical consultation in a medical question and answer knowledge base based on the medical conversation precursor;

automatically generating one or more second candidate replies to the medical consultation using a dialogue generation model based on the medical dialogue preamble, the one or more first candidate replies and the one or more second candidate replies constituting a candidate reply set;

scoring the candidate replies in the candidate reply set by using a scoring model; and

selecting a reply to the medical advice from the set of candidate replies based on a scoring result of a candidate reply from the set of candidate replies.

2. The method of claim 1, wherein the conversation generation model is trained using a conversation library associated with a particular territory.

3. The method of claim 1, wherein obtaining the medical session preamble comprises receiving a telephone voice from a user and converting the telephone voice to text, and further comprising converting the reply to telephone voice for output to the user's telephone.

4. The method of claim 3, wherein the telephony speech is dialect and converting the telephony speech to text comprises converting the dialect to Mandarin text and/or converting the reply to telephony speech comprises converting the reply to telephony speech in dialect form.

5. The method of claim 1, further comprising:

determining role information associated with the medical conversation precursor, the role information indicating whether an originator of a session in the conversation precursor is a patient or a doctor; and

processing the medical session preamble based on the role information as input to the session generation model.

6. The method of claim 1, further comprising:

predicting entity information associated with the response, the entity information relating to one or more of a symptom, an incentive, a drug, a treatment regimen of the patient; and

using the predicted entity information together with the medical session preamble as input for the session generation model.

7. The method of claim 1, wherein the scoring model is trained based on a set of medical conversation samples, each conversation sample comprising a medical conversation precursor-reply pair, the conversation samples divided into positive samples and negative samples, wherein replies in positive samples match the medical conversation precursor and replies in negative samples do not match the medical conversation precursor.

8. The method of claim 1, further comprising:

automatically obtaining at least a portion of a medical profile of the patient from a medical facility based on the session preamble;

converting the at least a portion of the medical profile into text as part of the session preamble.

9. The method of claim 1, further comprising:

sending an instruction for acquiring health condition data of the patient to a health condition acquisition device based on the session preamble;

receiving health data from the health collection device; and

converting the health data to text as part of the dialog context.

10. The method of claim 1, further comprising:

saving the reply in the knowledge base of medical question and answer based on the feedback to the reply.

11. The method of claim 1, further comprising:

automatically calling medical emergency services for the patient; or alternatively

Automatically performs appointment registration for the patient in the hospital.

12. The method of claim 1, wherein at least a portion of the medical session preamble is from a physician, and wherein the method is used to assist a physician in performing a diagnosis.

13. A computer-implemented system for automatically replying to medical advice, comprising:

a retrieval model for retrieving, in a knowledge base of medical questions and answers, one or more first candidate replies to medical advice included in a preamble of a medical session based on the preamble of the medical session;

a conversation generation model for automatically generating one or more second candidate replies to the medical consultation based on the medical conversation preamble, the one or more first candidate replies and the one or more second candidate replies constituting a set of candidate replies;

a scoring model for scoring the candidate replies in the set of candidate replies; and

a reply module to select a reply to the medical consultation from the set of candidate replies based on a scoring result of the candidate replies in the set of candidate replies.

14. The system of claim 13, wherein the conversation generation model is trained using a conversation library associated with a particular territory.

15. The system of claim 13, further comprising a pre-processing module, wherein the pre-processing module is configured to convert received telephone speech from the user into text, and the reply module is further configured to convert the reply into dialect-form telephone speech.

16. The system of claim 13, further comprising:

a role determination model to determine role information associated with the medical conversation preamble, the role information indicating whether an originator of the session in the conversation preamble is a patient or a doctor, wherein the medical conversation preamble is processed based on the role information as an input to the conversation generation model.

17. The system of claim 13, further comprising:

an entity prediction model to predict entity information associated with the reply, the entity information relating to one or more of a symptom, an incentive, a drug, a treatment plan of the patient, wherein the predicted entity information is input to the session generation model along with the medical session preamble.

18. The system of claim 13, further comprising a medical profile acquisition module for automatically acquiring at least a portion of a patient's medical profile from a medical facility based on the session preamble, wherein the at least a portion of the medical profile is included as part of the session preamble.

19. The system of claim 13, further comprising a health data acquisition module to:

receiving health condition data from the health condition acquisition device; and

converting the health data to text as part of the dialog context.

20. The system of claim 13, further comprising:

the emergency service module is used for automatically calling medical emergency service for the patient; or

And the reservation module is used for automatically reserving and registering for the patient.

21. The system of claim 13, further comprising:

a saving module to selectively save the reply in the knowledge base of medical questions and answers based on the feedback to the reply.

22. An apparatus for implementing automatic question answering, comprising:

a memory; and

a processor configured to perform the method of any one of claims 1-12.

23. A computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform the method of any of claims 1-12.