WO2023075683A2

WO2023075683A2 - A clinical simulation system and method

Info

Publication number: WO2023075683A2
Application number: PCT/SG2022/050751
Authority: WO
Inventors: Tat Yang KOH; Zi Ning Anthea FOONG; Han Wei NG; Jeremy Teng Yuen ONG; Eng Tat KHOO; Gabriel LIU
Original assignee: National University Of Singapore; National University Hospital (Singapore) Pte Ltd
Priority date: 2021-10-27
Filing date: 2022-10-21
Publication date: 2023-05-04
Also published as: WO2023075683A3

Abstract

A clinical simulation method is disclosed. The method includes generating a virtual patient that is defined by multiple variables. The method includes obtaining an intent and associated output sentences that are associated with at least some of the variables and storing the intent and associated output sentences into a data store. The method further includes receiving an input sentence from a user, splicing the input sentence received from the user to obtain one or more spliced sentences and paraphrasing the one or more spliced sentences to obtain one or more paraphrased sentences. Thereafter, the method includes obtaining at least one intent associated with the one or more paraphrased sentences from the data store and playing back to the user an output sentence associated with each of the at least one obtained intent. A system for clinical simulation is also disclosed.

Description

A CLINICAL SIMULATION SYSTEM AND METHOD

TECHNICAL FIELD

[0001] This invention relates to a clinical simulation system and method. More particularly, this invention relates to a clinical simulation system and method wherein a user can carry out a conversation with a generated virtual patient.

BACKGROUND

[0002] The following discussion of the background to the invention is intended to facilitate an understanding of the present invention only. It should be appreciated that the discussion is not an acknowledgement or admission that any of the material referred to was published, known or part of the common general knowledge of the person skilled in the art in any jurisdiction as at the priority date of the invention.

[0003] Electronic virtual patients (VPs) are interactive screen-based computer simulations of real-life clinical scenarios that are widely used for the purposes of health sciences education. VPs allow for safe and supportive experiences for health sciences students to practice problem solving and diagnostic skills without endangering real patients. One such solution is disclosed in Dmitriy Babichenko et. al., “Moving Beyond Branching: Evaluating Educational Impact of Procedurally-Generated Virtual Patients" (Journal ID: 10.1109/SeGAH.2019.8882436). In the disclosed system, virtual patient cases are procedurally generated leveraging Bayesian network (BN) models learned from electronic medical records data to present clinical scenarios and control outcomes of learners' decisions within the context of a presented VP.

[0004] Another solution is disclosed in Jordan J. Bird et.al . , “Chatbot Interaction with Artificial Intelligence: human data augmentation with T5 and language transformer ensemble for text classification" (Journal ID: https://’doi.org/10.1007/s12652-021 ■ 03439-8). The solution discloses a chatbot interaction with artificial intelligence (CI-AI) framework as an approach to the training of a transformer based chatbot-like architecture for task classification with a focus on natural human interaction with a machine as opposed to interfaces, code, or formal commands. The intelligent system augments human-sourced data via artificial paraphrasing in order to generate a large set of training data for further classical, attention, and language transformation-based learning approaches for Natural Language Processing (NLP). Human beings are asked to paraphrase commands and questions for task identification for further execution of algorithms as skills. The commands and questions are split into training and validation sets. The training set is paraphrased by the T5 model in order to augment it with further data. This resulted in a highly-performing model that allows the intelligent system to interpret human commands at the social-interaction level through a chatbot-like interface.

[0005] Another solution is disclosed in U.S. Patent No. 10438610B2, Brown et. al., entitled “Virtual assistant conversations”. The solution is implemented as a method under the control of one or more processors executing computerized instructions stored in a memory. The method includes causing a virtual assistant to be presented via a computing device to enable a conversation with a user in a natural language. The computerized instructions are configured via a graphical user interface that receives features of a knowledge base for storage in the memory. The features of the knowledge base are organized to trigger outputs according to units of vocabulary patterns arranged in the features of the knowledge base. The units of vocabulary patterns are stored in the memory with respective labels for each feature. The method further includes receiving a speech input string via the virtual assistant during the conversation, converting the speech input string to converted data by substituting portions of the speech input string with sets of terms bearing corresponding labels that refer back to the respective labels of the knowledge base, matching the respective labels to the corresponding labels to identify a digital response to the speech input string, and causing the digital response to be presented to the user in real-time via the virtual assistant.

SUMMARY

[0006] According to an aspect of the present disclosure, there is provided a method for clinical simulation. The method includes generating a virtual patient that is defined by multiple variables. The method includes obtaining an intent and associated output sentences that are associated with at least some of the variables and storing the intent and associated output sentences into a data store. The method further includes receiving an input sentence from a user, splicing the input sentence received from the user to obtain one or more spliced sentences and paraphrasing the one or more spliced sentences to obtain one or more paraphrased sentences. Thereafter, the method includes obtaining at least one intent associated with the one or more paraphrased sentences from the data store and playing back to the user an output sentence associated with each of the at least one obtained intent.

[0007] In some embodiments of the method, the intent is obtained from input sentences associated with the output sentences.

[0008] In some embodiments of the method, the method further includes adding to the data store further input sentences obtained from each intent and associated input sentences.

[0009] In some embodiments of the method, the method further includes adding to the data store further output sentences obtained by paraphrasing the output sentences.

[0010] In some embodiments of the method, the input sentence obtained from the user is a spoken input sentence.

[0011] In some embodiments of the method, splicing is based on pauses in the spoken input sentence.

[0012] In some embodiments of the method, the data store is a natural language processing (NLP) model and wherein obtaining at least one intent associated with the at least one paraphrased sentence from the data store comprises obtaining at least one intent that has at least a predetermined confidence level as determined by the NLP model.

[0013] In some embodiments of the method, playing back to the user an output sentence comprises playing back an audio recording of the output sentence.

[0014] In some embodiments of the method, playing back an audio recording of the output sentence includes playing back the audio recording of the output sentence via a headset of the user, wherein the audio recording is stored on the headset.

[0015] In some embodiments of the method, the plurality of variables defining the virtual patient includes a first variable, a second variable and a third variable. In some embodiments, the first variable may be related to a second variable via a first probability distribution and a second probability distribution that is different from the first probability distribution. In other embodiments, the first variable may be related to both the second variable and the third variable. And only one of the second variable and the third variable is selected for obtaining intent and associated output sentences that are associated therewith. And yet further embodiments, the third variable may have a value that is obtained based on a higher of a first probability associating the first node and the third node and a second probability associating the second node and the third node.

[0016] According to another aspect of the present disclosure, there is provided a clinical simulation system that includes a server. The server is operable to perform the different embodiments of the above-described method.

[0017] According to yet another aspect of the present disclosure, there is provided a program storage device readable by a computing device, tangibly embodying a program of instructions, executable by the computing device to perform the different embodiments of the above-described method.

[0018] Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.

BRIEF DESCRIPTION OF DRAWINGS

[0019] The invention will be better understood with reference to the drawings, in which:

Figure 1 is a block diagram illustrating a clinical simulation system including a headset communicatively coupled to a server;

Figure 2 is a flowchart showing a sequence of steps performed in the system in Figure 1 for clinical simulation, the sequence including a generate virtual patient step;

Figure 3 is a table of input sentences and corresponding output sentences obtained from experienced medical practitioners;

Figure 4 is the table in Figure 3, with additional variable labels and intents obtained from the input sentences, and added animation file references corresponding to the output sentences;

Figure 5 is the table shown in Figure 4 with additional output sentences categorized to avoid conflicts;

Figure 6 is a flowchart showing a detailed sequence of steps for the generate virtual patent step in Figure 2;

Figure 7 is a probabilistic model that is used for generating a virtual patient, the probabilistic model including a number of variables; Figure 8 is a table of intention-answer pairs selected for the variables in Figure 7;

Figure 9 is a table of input sentences and corresponding paraphrased input sentences;

Figure 10 is a block diagram illustrating the clinical simulation system in Figure 1 in more details;

Figure 11 is a diagram illustrating how an input sentence is spliced and paraphrased for processing by an NLP model;

Figure 12 is a diagram illustrating how a spliced sentence is paraphrased;

Figure 13 is a flowchart showing a sequence of steps for obtaining an output sentence for playback in response to an auditory input from a user;

Figure 14 is a diagram illustrating a virtual patient viewable via the headset during clinical simulation;

Figure 15 is a block diagram illustrating typical elements of the server in Figure 1 that may be appropriately programmed for use in the clinical simulation system;

Figure 16 is a table of expected input (intent) and corresponding output sentence with associated animation;

Figure 17 is the table in Figure 16 with added category labels and associated probabilities;

Figure 18 is the table in Figure 16 with added associated categories;

Figure 19 is the table in Figure 18 with an added topic;

Figure 20 is a diagram illustrating a “simple" gate between variables of a probabilistic model;

Figure 21 is a diagram illustrating a “either” gate between variables of a probabilistic model;

Figure 22 is a diagram illustrating an “or” gate between variables of a probabilistic model;

Figure 23 is a diagram illustrating a “max“ gate between variables of a probabilistic model;

Figure 24 is a diagram illustrating another probabilistic model having a number of nodes; and

Figure 25 is a diagram illustrating the probabilistic model in Figure 24 with some nodes selected and other nodes deactivated. DETAILED DESCRIPTION OF THE EMBODIMENTS

[0020] Throughout this document, unless otherwise indicated to the contrary, the terms “comprising”, “consisting of’, “having” and the like, are to be construed as non- exhaustive, or in other words, as meaning “including, but not limited to.”

[0021] Furthermore, throughout the specification, unless the context requires otherwise, the word “include” or variations such as “includes” or “including” will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.

[0022] Throughout the description, it is to be appreciated that the term ‘processor/controller’ and its plural form include microcontrollers, microprocessors, programmable integrated circuit chips such as application specific integrated circuit chip (ASIC), computer servers, electronic devices, and/or combination thereof capable of processing one or more input electronic signals to produce one or more output electronic signals. The controller includes one or more input modules and one or more output modules for processing of electronic signals.

[0023] Throughout the description, it is to be appreciated that the term ‘server’ and its plural form can include local, distributed servers, and combinations of both local and distributed servers.

[0024] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by a skilled person to which the subject matter herein belongs.

[0025] As shown in the drawings for purposes of illustration, the invention may be embodied in a clinical simulation method and system wherein a large number of virtual patients can be generated and wherein a trainee doctor is able to carry out a realistic conversation with a virtual patient. Existing systems tend to be lacking in these respects. Referring to Figures 2 and 6, the method includes generating a virtual patient that is defined by multiple variables. The method includes obtaining an intent and associated output sentences that are associated with at least some of the variables and storing the intent and associated output sentences into a data store. The method further includes receiving an input sentence from a user, splicing the input sentence received from the user to obtain one or more spliced sentences and paraphrasing the one or more spliced sentences to obtain one or more paraphrased sentences. Thereafter, the method includes obtaining at least one intent associated with the one or more paraphrased sentences from the data store and playing back to the user an output sentence associated with each of the at least one obtained intent.

[0026] Figure 1 shows a system 2 for implementing the clinical simulation method described above. The system 2 includes a headset 4 that is data communicatively coupled to a cloud-based server 6. The headset 4 may be any virtual reality (VR) headset, such as but not limited to an Oculus Quest 2 Headset developed and sold by Meta Platforms (formerly Facebook, Inc.). The Oculus Quest 2 headset 4 is capable of running as both a standalone headset with an internal, Android-based operating system, and with Oculus Rift-compatible VR software running on a desktop computer when connected over a suitable wireless interface, such as USB or Wi-Fi. The headset 4 includes touch controllers (not shown) that translate a user’s gestures, motions and actions directly into VR. The user 8 (Figure 14) is therefore able to interact with a backend application running on the cloud-based server 6 that implements the method by inputting commands into the headset 4. The headset 4 also includes a built-in microphone and speakers (both not shown).

[0027] The cloud-based server 6 includes one or more processors 10 (Figure 15) that is operable to perform the above-described method. The backend application is data communicatively coupled to a data store, such as a database 12, storing input sentences and associated output sentences as well as their associated variable labels. The backend application includes a virtual patient generation module 14 that determines and selects which output sentences in the database 12 to incorporate into a generated virtual patient 16 (Figure 14). The backend application also includes a conversational artificial intelligence (CAI) module 18 that enables free flow informal speech with the generated virtual patient 16 in the context of a medical consultation. The CAI module 18 includes a conversational pipeline 20 that receives auditory input sentence from a user and produces a contextually coherent output sentence from the generated virtual patient 16.

[0028] Prior to use of the system 2 by a user 8, data is collected from experienced medical practitioners. Input sentence 22 and associated output sentence 24 pairs are generated using the data provided by the experienced medical practitioners. Figure 3 shows a table including input sentences 22 and corresponding output sentences 24 obtained from experienced medical practitioners. [0029] Variable labels 26 and intents 28 are extracted from and associated with the input sentence 22 and output sentence 24 pairs in Figure 3. These are stored as a record in the database 12 using unstructured data. From the input sentences 22 in Figure 3, one is able to identify the input sentences 22 as relating to a back pain variable and the intent 28 is to ask about the location of the back pain. Figure 4 shows data that is stored in the database 12. The database 12 may be created using a PostGreSQL database which is hosted on the Microsoft Azure Cloud Services. However, other databases may also be used.

[0030] The database 12 includes multiple records. Each record includes a variable label 26, one or more intents 28 associated with the variable label 26, one or more input and output sentence 22, 24 pairs associated with each intent 28, and an associated animation clip 30. Where conflicts exist, each record may include different output sentences 24 that are categorized based on another variable label 32. Figure 5 shows such a record wherein a conflict exists. In this case, the location of back pain is dependent on the patient’s age. For the same intent of “where is your pain?” under the variable label “back pain”, different output sentences 24 are available based on the patient’s age.

[0031] Optionally, each output sentence 24 may be paraphrased to obtain variations thereof (not shown). These further output sentences allow the virtual patient 16 to “speak” with more natural sentences while still capturing the main context and meaning behind the original input sentence.

[0032] During use in training and/or assessment, the user 8 such as a trainee doctor, wears the headset 4 and runs the backend application via the headset 4. The backend application performs the clinical simulation method. Figure 2 is a flowchart showing a sequence of steps in a clinical simulation method 40 according to an embodiment of the invention. Specifically, the method 40 starts in a GENERATE VIRTUAL PATIENT step 42, wherein the processor 10 generates a virtual patient 16. The details of this step 42 are shown in Figure 6. The method 40 starts in a SELECT DISEASE step 44, wherein the user 8 is prompted via the headset 4 to enter a disease type. The user 8 is able to select a disease type via the headset 4. With the selected disease type, the method 40 proceeds to a SELECT DEMOGRAPHICS AND DISEASE VARIABLES step 46, wherein the user 8 is prompted to select demographics of the virtual patient and disease variables 50 (Figure 7) associated with the selected disease. In this step, the user 8 has the option to set the value of any demographic or disease variable. The user 8 may also choose not to set the value of any of these demographic and disease variables and leave it to the system 2 to generate them. This step 46 however does not allow the user 8 to set all demographic and disease variables so as to leave a set of variables whose values are not finalized. The variables 50 are related according to a probabilistic model 49 as shown in Figure 7. The method 40 next proceeds to a RELATED UNFINALIZED VARIABLE? decision step 52 wherein the processor 10 determines if each unfinalized variable 50 is related to another variable 50. If it is determined in this RELATED UNFINALIZED VARIABLE? decision step 52 that no unfinalized variable is related to another variable, the method 40 proceeds to a SELECT RANDOM SETTING step 54 for the processor 10 to randomly set a value for each of these unfinalized variables. However, if it is determined in the RELATED UNFINALIZED VARIABLE? decision step 52 that one or more unfinalized variables is related to another variable, the method 40 proceeds to a RUN MODEL step 56, wherein the processor 10 processes the probabilistic model 49 to obtain values of any unfinalized variables. This RUN MODEL step 56 will be described in detail later.

[0033] The virtual patient 16 is defined by the probabilistic model 49 which has a number of variables 50 that represent the patient’s demographic, disease parameters and associated parameters. Each variable 50 is associated with one or more other variables 50. The variables 50 are related to one another whereby changing one variable 50 can invoke changes in other variables 50 depending on the situation.

[0034] The probabilistic model 49 relating one variable 50 to the other variables can be visualised in the form of a non-simple graph with nodes that are connected by edges. Each of the nodes represent a possible variable 50 whose value will determine the input and output sentence 22, 24 pair to be incorporated into the virtual patient 16. Each of the edges represent a probability of the variable to be present as a result of the other variable connected thereto. The reason for the non-simple graphical linkages between each node is due to the nature of there being different probabilities relating to two connected variables whereby the probability might change depending on which variable is determined to be the cause of the other variable. Therefore, this is reflected in the non-simple cyclic graph whereby the probability will change depending on which node is selected for processing first. [0035] The probabilistic model may 49 have any number of variables 50 although only four variables are shown in the probabilistic model 49 in Figure 7. For an initial model 49, the number of variables may be small. But as more knowledge is gained of real patients, more variables 50 may be added to the model 49 to more accurately represent a virtual patient.

[0036] In the example shown in Figure 7, the probabilistic model 49 includes four variables 50, a disease variable, a weight variable, an age variable and an exercise habits variable. In this example, the disease variable can have one of three values, dementia, heart arrythmia and diabetes. This disease variable is related to the weight variable. There is an edge going from the disease node to the weight node. This edge indicates that each disease type is associated with a patient weight by a respective probability distribution. For example, if the disease is dementia, the probabilities of the patient being light, of average weight and heavy are 0.33, 0.33 and 0.34 respectively. If the disease is heart arrythmia, the probabilities of the patient being light, of average weight and heavy are 0.1 , 0.3 and 0.6 respectively. And if the disease is diabetes, the probabilities of the patient being light, of average weight and heavy is 0.15, 0.35 and 0.5 respectively.

[0037] There is also an edge going in the opposite direction from the weight node to the disease node. Similarly, given a patient who is light in weight, the probabilities of the patient having dementia, heart arrythmia and diabetes are 0.33, 0.33 and 0.34 respectively. And for a patient who is of average weight, the probabilities of the patient having dementia, heart arrythmia and diabetes are 0.1 , 0.3 and 0.6 respectively. For a patient who is heavy, the probabilities of the patient having dementia, heart arrythmia and diabetes is 0.15, 0.35 and 0.5 respectively.

[0038] In the RUN MODEL step 56 shown in Figure 6, the processor 10 traverses the probabilistic model 49 to obtain values for selected variables 50 that are unfinalized. For example, the user may have selected dementia as the disease for training. In the RUN MODEL step 56, the processor generates a random number of a value between 0 and 1. If the value of the random number is between 0 - 0.33, the weight of the virtual patient is set to light. If the value of the random number is between 0.34 - 0.66, the weight of the virtual patient is set to average. And if the value of the random number is between 0.67 - 1 , the weight of the virtual patient is set to heavy. The values of the other variables 50 in the probabilistic model are selected in a similar manner using respective probabilistic distributions (not shown).

[0039] Once the values of all variables 50 have been selected, the method 40 proceeds next to an OBTAIN INTENTS AND OUTPUT SENTENCES step 58 shown in Figure 6, wherein the processor 10 obtains corresponding intents 28 and output sentences 24 based on the value of each variable 50. At this point, the virtual patient 16 is no longer defined by variables but rather by a set of intent, associated input sentences and output sentences. The input sentence-output sentence pairs are collated to form a final pool of input sentence-output sentence pairs, referred hereinafter simply as intention-answer pairs. It should be noted that the intention or intent in each intention-answer pair is merely the idea or gist of the input sentence 22 and does not indicate that the user 8 must use a specific input sentence in order to obtain the corresponding response in the form of the output sentence 24. In addition, the answers 24 are not fixed in nature and may vary. More specifically, once an intent 28 is determined from an input sentence 22, one out of a number of possible answers 24 is randomly selected as a response to the input sentence 22. Figure 8 shows variables 50 defining a virtual patient 16 and associated intention-answer pairs 60 that are in a final pool.

[0040] During use, one or more of the probability distributions may be updated through a machine learning process where the backend application determines probability distributions that are more reflective of statistically accurate patient models in real life and updates such probability distributions in the model 49. This is shown to be carried out in an UPDATE PROBABILITIES step 62 in Figure 6. The probability distributions are updated only if it is determined that the user is a medical professional/expert as determined in the UPDATE PROBABILITIES step 62.

[0041] The method 40 next proceeds to an OBTAIN NLP MODEL step 64, wherein the processor 10 processes the input sentences 22 and associated intent 28 to obtain further input sentences 66 from which the associated intent 28 can be recognized. In this manner, more variations of input sentences 22 that map to each intent 28 is available. Specifically, in this OBTAIN NLP MODEL step 64, the finalised pool of intention-answer pairs are converted into yaml files which are necessary for training an NLP/Sentiment Analysis tool hosted on a machine learning (ML) training server 70 shown in Figure 10, such as but not limited to open-source RASA software, SpaCy and BERT, to obtain a baseline NLP model.

[0042] The open-source RASA software uses a DIET classifier (not shown) known to those skilled in the art. The DIET classifier identifies both the contextual information of the input sentence 22 as well as allow for recognition of entity keywords. This allows for the baseline NLP model to pick up on specific discriminatory words in the input sentence 22 which is well suited for the use-case in medical contexts. The baseline NLP model serves only to process inputs. Furthermore, it is only able to accept textual input and outputs for training and testing. The NLP model also serves as a data store for storing the intents and associated output sentences.

[0043] Based on these input sentences 22 obtained from the experienced medical practitioners and associated intents, variations of the input sentences 22 bearing the same context are generated by the baseline NLP model. These further input sentences would add to the stockpile of input sentences 22 that the system 2 can recognize and respond to. Through this training, data augmentation is performed, and this allows the baseline NLP model to be trained to recognize more general questioning structures in the input sentences 22. The baseline NLP model is trained to decode both situational context as well as the medical context information of the input sentences. Furthermore, less upfront input sentences 22 from experienced medical practitioners are required to achieve robust NLP model performance. The ML training server 70 outputs a zip file that defines the baseline NLP model. The zip file requires a specific software to run, which is known to those skilled in the art. The baseline NLP model is hosted on one or more model hosting servers 72 for scalability and redundancy purposes.

[0044] In this OBTAIN NLP MODEL step 64, the processor 10 also converts all the output sentences 24 to audio files using any suitable text to speech server 74. Each audio file is associated with an output sentence 24. Each audio file will also have a unique reference number. The generated audio files are uploaded to the headset 4 as shown in Figure 10. At this point, the backend application is ready for use to start a conversation with the trainee doctor 8.

[0045] The method 40 next proceeds to a VIRTUAL PATIENT TRAINING step 78 shown in Figures 2 and 6. Specifically, the method 40 proceeds to a RECEIVE AUDIO INPUT step 80 shown in Figure 2, wherein the headset 4 receives via the microphone an input sentence 22 in the form of a spoken sentence from the user 8. The backend application converts this spoken input sentence from audio to text. This audio to text conversion is carried out by using a voice recognition software provided by the Microsoft Azure Services 82 shown in Figure 10. The audio input is transcribed into text using the Speech-To-Text services provided by the software. Optionally, the audio input may be pre-processed by the Conversational Artificial Intelligence (CAI) module 18 prior to being input into the voice recognition software. However, in view of response time of the system 2, it is important that any such processing of the audio input does not take too long such that it introduces a lag in the system response time. Time taken for any pre-processing of the audio input should be minimized, especially for online use of the system 2.

[0046] The method 40 next proceeds to a SPLICE INPUT SENTENCE step 84 shown in Figure 2, wherein the processor 10 splices the input sentence 22 in text form into one or more spliced sentences in a splicing model 86 as shown in Figure 10. The splicing model 86 is achieved by creating a machine learning algorithm that breaks an input sentence into its respective component spliced sentences 88, 90 as shown in Figure 11. The purpose of the splicing model is to allow the baseline NLP model to better handle input sentences 88 that do not carry meaningful intent as well as sentences 90 that contain multiple intentions. When the auditory input is transcribed into textual input and fed into the splicing model 86, the separated textual input can then be eventually fed into the baseline NLP model individually to obtain the baseline NLP model’s output response. This allows the baseline NLP model to ignore parts 88 of the input sentence 22 that contain no meaningful intent since the output will fall below a certain confidence threshold and thus be ignored. For the remaining parts 90 that contain meaningful intent, the baseline NLP model will be able to produce an appropriate output sentence 24 for each of the individual intentions behind the original input sentence 22 and thus be able to give a multi-intention response. As shown in Figure 11 , an example textual input sentence is “Um. I see. How are you? Where are you feeling pain?" The splicing model 86 will break this input sentence 22 up into four spliced sentences, “Um.”, “I see.” 88, “How are you?” and “Where are you feeling pain?” 90. The breaks in the input sentence 22 may correspond to pauses in the spoken input sentence 22. [0047] The method 40 next proceeds to a PARAPHRASE SPLICED SENTENCES step 92 shown in Figure 2, wherein the processor 10 paraphrases each spliced sentence to obtain one or more paraphrased sentences 93 for each spliced sentence 88, 90 in a paraphrasing model 94. The method 40 next proceeds to an OBTAIN INTENT AND OUTPUT SENTENCE step 98. For each of these paraphrased sentences that is fed into the baseline NLP model, the baseline NLP model will output one or more intents and corresponding confidence levels as shown in Figure 11. Figure 12 illustrates how a spliced sentence 90 is paraphrased using the paraphrasing model 94.

[0048] The paraphrasing model may be created by using the T5 transformer that is pre-trained using the C4 dataset which includes 750GB of English language text sourced from the web as is known to those skilled in the art. Subsequently the pretrained model is used in the system 2 to transcribe text in the medical context. To ensure that the paraphrased sentences 93 maintain the contextual information, a hard minimum and maximum threshold in the number of characters of a sentence 88, 90 are set to determine if the sentence 88, 90 should be paraphrased. Optionally, in the paraphrasing model 94, incorrect translation of audio inputs by the Microsoft Azure Services 82 may be addressed by utilising an additional machine learning natural language processor to convert the translated auditory inputs into a paraphrased sentence with perfect grammar and sentence structure while maintaining the sentence intention.

[0049] This paraphrasing model 94 may also be applied onto the training dataset, more specifically input sentences 22, in the OBTAIN NLP MODEL step to allow the baseline NLP model to also recognise the rephrased input sentences in addition to the input sentences 22. This paraphrasing model 94 is developed using a separate natural language processor designed to identify key vectors in the given input in order to produce a corrected paraphrased sentence with near-identical meaning.

[0050] The method 40 next proceeds to a PLAYBACK OUTPUT SENTENCE step 100 as shown in Figure 2, wherein the processor 10 playbacks an output sentence for each intent classification with a confidence level that is above a predetermined threshold. That is, the processor 10 discards any unrecognized intentions and retains only intent classifications above the predetermined threshold. An output sentence 24 corresponding to each retained or recognized intent classification is then selected as the response output in chronological order, thus allowing the baseline NLP model to give a differentiated multi-response to a single auditory input sentence. Specifically, the baseline NLP model will randomly select an output sentence 24 associated with each recognized intent. The NLP model will send the reference number of the audio file corresponding to the selected output sentence 24 to the headset 4. The headset 4 will then obtain the audio file with the reference number stored thereon and play the audio file via the speakers of the headset 4 to the user 8. The animation clip associated with virtual patient is also played on the display of the headset 4. The display will show the virtual patient 16 in a scene environment, for example in a clinic setting, seated beside the trainee doctor 8, responding to the questions of the trainee doctor 8 during a conversation therewith as shown in Figure 14. The virtual patient 16 is a 3D model created using the Unity3D software corresponding to the demographic variables selected earlier in the GENERATE VIRTUAL PATIENT step 42.

[0051] The method loops around the steps 80, 84, 92, 98, 100 where the assessment continues with the user 8 asking more questions 22 and obtaining more responses 24 from the backend application until the user 8 ends the training session. In this manner, the user is able to view and interact with the virtual patient 16 through the headset. The user 8 is able to ask the virtual patient 16 questions 22. Each question 22 is captured and analyzed by the backend application to obtain an intent 28 and to pick a response/output sentence 24 associated with the intent 28 and play it back to the user 8 via the speakers in the headset 4.

[0052] With this system 2, the user 8 can run through medical simulation scenarios. While the user 8 is undergoing the simulation session, relevant performance data is collected and stored onto a database. At the end of the session, the database can then be referenced by the user to receive relevant performance insights. These insights include but are not limited to the time taken to complete the scenario, whether the appropriate questions were asked, how structured was the conversation and whether the diagnosis given was accurate. The system is also able to use the collected data to output suggestions on areas the user can improve upon as well as possible scenarios where the user can practice on the weaker areas. The NLP model can also be updated to provide more focus on the weaker areas.

[0053] Figure 15 is a block diagram illustrating typical elements of the server 4 that may be appropriately programmed for use in the clinical simulation system 2 described above. The elements include the programmable processor 10 connected to a system memory 102 via a system bus 104. The processor 10 accesses the system memory 102 as well as other input/output (I/O) channels 106 and peripheral devices 108. The server 4 further includes at least one program storage device 110, such as a CD-ROM, tape, magnetic media, EPROM, EEPROM, ROM or the like. The server 4 stores one or more computer programs that implement the method 40 of clinical simulation according to an embodiment of the present invention. The processor 10 reads and executes the one or more computer programs to perform the method 40. Each of the computer programs may be implemented in any desired computer programming language (including machine, assembly, high level procedural, or object oriented programming languages). In any case, the language may be a compiled or interpreted language.

[0054] Advantageously, the system provides a realistic platform for training trainee doctors. The trainee doctors can use the system to practice any time anywhere. There is no need for physical interaction between a trainee doctor and a real patient, thus reducing the risk of transmission of diseases. Furthermore, the system 2 allows for the creation of rare diseases which would otherwise be difficult to find and train a standardised patient. The system 2 is also able to better mimic the disease symptoms such as the movement animations of the patient as well as the visual symptoms of the disease.

[0055] The probabilistic model for generating a virtual patient in the system 2 allows for the scalability of the scenarios to generate infinitely many scenarios to train on. With the large number of virtual patients/scenarios, the system 2 also prevents users from memorising outcomes. It also gives replayability value. The probabilistic model also allows for finer control of the training scenario to focus on areas where there is a need for improvement. The system can also capture performance data for the different users and to let a user know where he or she stands compared to the others. Healthcare education providers will benefit from cost savings given that the burden of providing patients to trainee doctors for learning is reduced. Standardization in teaching can be achieved as well, thereby ensuring that every trainee doctor will be continuously exposed to evidence-based best practice care.

[0056] The conversational Al module is able to simulate realistic conversation between the trainee doctor and the virtual patient. The trainee doctor is able to interact with the created virtual patient within the virtual reality environment, whereby the trainee doctor can use natural informal speech to communicate with the virtual patient. Test results of the system 2 indicate the efficacy of the system in achieving realistic virtual patients that are capable of understanding free flow speech in a medical context.

[0057] Furthermore, it was shown that the system was able to achieve a 90% accuracy rate in terms of the correct classification intentions of input sentences from the users using a tenfold cross validation. This was further examined through a confusion matrix diagram. Through the use of the additional splicing model and paraphrasing model, the entire workflow from obtaining a raw auditory input sentence to receiving an auditory output sentence is carried out without significant drops in the NLP model’s working accuracy.

[0058] The NLP model created also has the ability to classify layman terms into the appropriate medical terminology and vice versa, whereas most commonly used NLP models are trained to only recognise and classify the medical terminology. Through the NLP model of the system 2, the auditory input sentence that can be used by the user will encompass a wider range of vocabulary and more closely mimics the speech used in the medical field. The NLP model with high classification accuracy can be translated to future virtual characters to emulate true-to-life conversations, creating possibilities of individual and multiplayer peer-to-peer real-time practice.

[0059] In addition, since the splicing model is able to handle non-meaningful intents very well, the original NLP model does not need to be trained to recognise these non- meaningful intents since only parts of the input with meaningful intentions will cross the threshold for the eventual contribution towards the model output. Therefore, less data will be required to train the NLP model and only data pertaining to the relevant content is required.

[0060] Although the present invention is described as implemented in the abovedescribed embodiment, it is not to be construed to be limited as such. It is to be appreciated that modifications and improvements may be made without departing from the scope of the present invention.

[0061] As an example, the method 40 can also be implemented in a standalone computing device, such as but not limited to a headset, a mobile device, a laptop, desktop, or a headset connected to another computing device. The method can also be deployed in other platforms, such as but not limited to, a chatbot to conduct an online chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent.

[0062] As a further example, the intent-answer pairs may be organized alternately as shown in the tables in Figures 16-19. The table in Figure 16 has an “Expected Input (Intent)” field type, an “Output Sentence” field type and an “Animation” field type that are related to one another. Multiple records 120 are stored in the table. One such record includes “Can you show me where your pain is?” as an “Expected Input (Intent)” field, “It hurts at my lower back area.” as an “Output Sentence” field and “Animation Number: X" as an “Animation” field. As previously described, a virtual patient 16 consists of a set of conversational sentences 24, which given the correct prompt 22, will produce an appropriate conversational sentence 24 as an output. Thus, every sentence output 24 given by the virtual patient 16 requires two parts. Firstly, the expected input 22 and secondly, the output sentence 24 given the expected input 22. The expected input sentence 22 and the output sentence 24 correspond to the intent 28 of the user 8 and the utterance 24 of the virtual patient 16 respectively. The expected input 22 includes a list of possible sentences 22 that trigger corresponding outputs 24, and the output sentence 24 contains a single original sentence as well as a list of variations 66 that have identical contextual meaning as the original output sentence 24. These variations 66 may be obtained by paraphrasing the original sentence as described above. Each expected input 22 and associated output sentence 24 form a data pair. A virtual patient may be defined by multiple different data pairs for the purposes of mimicking natural conversation. This can be represented as a dictionary with the expected input and output sentence as labels/field types each containing their own list. Each of the sentence data pairs also contain a corresponding animation 30. The animations are a list of preset animations each containing a unique number. These animations are then assigned to every sentence data pair. When an output sentence 24 is played back on the speaker of the headset 4 as a result of an input 22 from the user 8 in the virtual environment, the corresponding animation file will be played on the display of the headset 4.

[0063] The set of data pairs that make up the virtual patient 16 is automatically selected from a database of possible data pairs as described above. For each data pair, a list of category labels 122 are also appended to the data pair together with a set of probabilities 124 for each category label. The category labels 122 are predetermined string labels and are applied across all the sentence data pairs. The probabilities 124 are float point numbers which indicate the likelihood of the selection of the sentence data pair given the presence of the category. The category labels 122 and the associated probabilities 124 are shown in the table in Figure 17.

[0064] Each probability may be represented as such:

category labels associated with the sentence data yai? [0065] The probabilities 124 are obtained from governmental databases as well as from expert opinions, which help determine the presence of possible sentences that can be said by a virtual patient given the prior presence of other variables. Since each sentence 24 also contains elements which may affect the presence or absence of other sentences, each sentence data pair has a list of associated category labels 126 appended to it. These associated category labels 126 approximately describe the key elements in each sentence and are also used to determine the selection of other sentence data pairs. These associated category labels 126 are shown in the table in Figure 18.

[0066] To ensure that there is no conflict between the selected sentences, the sentence data pairs together with the appended category labels 126 and probabilities are segmented into different sentence-type topics 128. These sentence-type topics indicate the nature of the content of the sentence, and all sentences which are similar to one another are placed together in one such sentence-type topic. This is to ensure that only one sentence may be selected out of these similar sentences. Organization of the data pairs with associated categories 126 and topic 18 is shown in the table in Figure 19. The database created will consist of all possible sentence data pairs, structured in the way as shown in Figures 16-19.

[0067] As a further example, the virtual patient generation step based on a probabilistic model is described to be used in conjunction with subsequent specific training steps involving a conversation between a trainee doctor and a virtual patient. This is not to be construed to be limited as such. The virtual patient generation step may be standalone or used in conjunction with other steps not necessarily involving splicing and paraphrasing as described above. [0068] As yet a further example, the probabilistic model described above includes nodes that are linked to one another. The links between two nodes act like “gates”. In the probabilistic model 49 shown in Figure 7, only two types of gates are shown, a “simple” gate 150 and an “either” gate 152. The “simple” gate 150 as shown in Figures 7 and 20 is one whereby one node contributes to the probability of the presence of another node connected thereto in the probabilistic model 49, but not vice versa.

[0069] The “either” gate 152, as shown in Figures 7 and 21 , is similar to the “simple” gate 150. The main difference between this “either” gate 152 and the “simple” gate 150 is that this “either" gate 152 involves two connected nodes whereby either of the nodes can contribute to the presence of the other node. A first node may be related to a second node based on a first probability distribution 154 as shown in Figure 7. The second node may be related to the first node based on a second probability distribution 156. The first probability distribution 154 may be different from the second probability distribution 156. The value of each node therefore depends on a sequence the node is visited as the model is processed. Any node in the model may be used as a starting node for processing the model.

[0070] Optionally or additionally, the probabilistic model 49 may include a third type of gate, an “or” gate 158. This “or” gate 158 involves nodes that can contribute to more than one factor, but there are two factors that are mutually exclusive to one another. Therefore, this gate 158 prevents the selection of conflicting nodes by only allowing one of the nodes to be selected and the other node will be deactivated entirely together with all of its connections. In other words, a first node may be related to both a second node and a third node, wherein only one of the second node and the third node can be selected for processing following the processing of the first node. The node that is not selected and any downstream nodes connected thereto will not be processed. The value of the selected node will be used for obtaining intent and associated output sentences associated with that node.

[0071] Optionally or additionally, the probabilistic model 49 may include a fourth type of gate, a “max” gate 160. For a probabilistic model that is shown in Figure 23, almost every node in the constructed model has one or more connections that can contribute to the probability of its activation. Thus, in order to determine the probability of the node’s activation, a max gate 160 is used whereby out of all the possible probabilities that can contribute to the activation, only the highest probability is selected. The only time this “max” gate is not invoked is when there is only one probability contributing to the node’s activation. In other words, the value of a third node may be obtained based on a first probability associating the first node and the third node and a second probability associating the second node and the third node. The value of the third node may be determined based on the higher of the first probability and the second probability. The value of the third node will be used for obtaining intent and associated output sentences associated with that node.

[0072] Therefore, through this series of connections via the gates, the node's activation probability will be decided. In the event that the node is determined to not be activated, the node as well as all of its possible downstream linkages will be deactivated. Certainly, this will affect certain gates like the “either” and “or” gates. The “either” gate will be deconstructed into a “simple” gate and the “or” gate will either be entirely deconstructed or reformed into a “simple” gate depending on whether the node deactivated is the origin node or one of the ‘or’ nodes respectively as shown in Figure 24. In Figure 24, nodes 2, 3 and 5 are selected nodes while nodes 4 and 6 are deactivated nodes.

[0073] One thing to note about the gates is that there could be many gate outputs from a single node. For instance, a single node may have the presence of the “simple” and the “or" gate, whereby it contributes separately to each of the gates. Another important factor to take into consideration is that the gates can be compounded together to form a complex hybrid gate. For example, the “or” gate or “max” gate can be combined alongside the “either” gate to form and reflect more complex relationships between the two nodes.

[0074] Through this, a realistic and contextually coherent virtual patient 16 can be created that reflects real world statistical data. Since it is well known that one variable does not necessarily indicate the presence of another confounding variable with 100% certainty, the probabilistic model proposed is more indicative of how the disease variables relate to one another. The virtual patients created as a result of such a probabilistic model will thus give rise to sufficient variation and still remain within realistic expectations.

[0001] It should be further appreciated by the person skilled in the art that one or more of the above modifications or improvements, not being mutually exclusive, may be further combined to form yet further embodiments of the present invention.

Claims

1. A clinical simulation method comprising: generating a virtual patient that is defined by a plurality of variables; obtaining intent and associated output sentences that are associated with at least some of the plurality of variables; storing the intent and associated output sentences into a data store; receiving an input sentence from a user; and splicing the input sentence received from the user to obtain at least one spliced sentence; paraphrasing the at least one spliced sentence to obtain at least one paraphrased sentence; obtaining at least one intent associated with the at least one paraphrased sentence from the data store; and playing back to the user an output sentence associated with each of the at least one obtained intent.

2. The clinical simulation method according to Claim 1 , wherein the intent is obtained from input sentences associated with the output sentences.

3. The clinical simulation method according to Claim 2, further comprising adding to the data store further input sentences obtained from each intent and associated input sentences.

4. The clinical simulation method according to any one of the preceding claims, further comprising adding to the data store further output sentences obtained by paraphrasing the output sentences.

5. The clinical simulation method according to any one of the preceding claims, wherein the input sentence obtained from the user is a spoken input sentence.

6. The clinical simulation method according to Claim 5, wherein splicing is based on pauses in the spoken input sentence.

22

7. The clinical simulation method according to any one of the preceding claims, wherein the data store is a natural language processing (NLP) model and wherein obtaining at least one intent associated with the at least one paraphrased sentence from the data store comprises obtaining at least one intent that has at least a predetermined confidence level as determined by the NLP model.

8. The clinical simulation method according to any one of the preceding claims, wherein playing back to the user an output sentence comprises playing back an audio recording of the output sentence.

9. The clinical simulation method according to Claim 8, wherein playing back an audio recording of the output sentence comprises playing back the audio recording of the output sentence via a headset of the user, wherein the audio recording is stored on the headset.

10. The clinical simulation method according to any one of the preceding claims, wherein the plurality of variables defining the virtual patient comprises a first variable, a second variable and a third variable having at least one of the following relationships: the first variable is related to the second variable via a first probability distribution and a second probability distribution that is different from the first probability distribution;the first variable is related to both the second variable and the third variable, wherein only one of the second variable and the third variable is selected for obtaining intent and associated output sentences that are associated therewith; and the third variable has a value that is obtained based on a higher of a first probability associating the first node and the third node and a second probability associating the second node and the third node.

11. A clinical simulation system comprising: a server operable to perform a method comprising: generating a virtual patient that is defined by a plurality of variables; obtaining intent and associated output sentences that are associated with at least some of the plurality of variables; storing the intent and associated output sentences into a data store; receiving an input sentence from a user; and splicing the input sentence received from the user to obtain at least one spliced sentence; paraphrasing the at least one spliced sentence to obtain at least one paraphrased sentence; obtaining at least one intent associated with the at least one paraphrased sentence from the data store; and playing back to the user an output sentence associated with each of the at least one obtained intent.

12. The clinical simulation system according to Claim 11, wherein the intent is obtained from input sentences associated with the output sentences.

13. The clinical simulation system according to Claim 12, wherein the method further comprises adding to the data store further input sentences obtained from each intent and associated input sentences.

14. The clinical simulation system according to any one of Claims 11-13, wherein the method further comprises adding to the data store further output sentences obtained by paraphrasing the output sentences.

15. The clinical simulation system according to any one of Claims 11-14, wherein the input sentence obtained from the user is a spoken input sentence.

16. The clinical simulation system according to Claim 15, wherein splicing is based on pauses in the spoken input sentence.

17. The clinical simulation system according to any one of Claims 1-16, wherein the data store is a natural language processing (NLP) model and wherein obtaining at least one intent associated with the at least one paraphrased sentence from the data store comprises obtaining at least one intent that has at least a predetermined confidence level as determined by the NLP model.

18. The clinical simulation system according to any one of Claims 11-17, further comprising a headset communicatively coupled to the server, and wherein playing back to the user an output sentence comprises playing back an audio recording of the output sentence via the headset, wherein the audio recording is stored on the headset.

19. The clinical simulation system according to any one of Claims 11-18, wherein the plurality of variables defining the virtual patient comprises a first variable, a second variable and a third variable having at least one of the following relations: the first variable is related to a second variable via a first probability distribution and a second probability distribution that is different from the first probability distribution; the first variable is related to both the second variable and the third variable, wherein only one of the second variable and the third variable is selected for obtaining intent and associated output sentences that are associated therewith; and the third variable has a value that is obtained based on a higher of a first probability associating the first node and the third node and a second probability associating the second node and the third node.

20. A program storage device readable by a computing device, tangibly embodying a program of instructions, executable by the computing device to perform the method for managing a memory pool containing memory blocks between a producer thread and a consumer thread running in parallel within a process, the method comprising: generating a virtual patient that is defined by a plurality of variables; obtaining intent and associated output sentences that are associated with at least some of the plurality of variables; storing the intent and associated output sentences into a data store; receiving an input sentence from a user; and

25 splicing the input sentence received from the user to obtain at least one spliced sentence; paraphrasing the at least one spliced sentence to obtain at least one paraphrased sentence; obtaining at least one intent associated with the at least one paraphrased sentence from the data store; and playing back to the user an output sentence associated with each of the at least one obtained intent.

26