WO2024096307A1 - Procédé de fonctionnement d'un modèle d'intelligence artificielle médicale, et dispositif électronique mettant en oeuvre celui-ci - Google Patents

Procédé de fonctionnement d'un modèle d'intelligence artificielle médicale, et dispositif électronique mettant en oeuvre celui-ci Download PDF

Info

Publication number
WO2024096307A1
WO2024096307A1 PCT/KR2023/013928 KR2023013928W WO2024096307A1 WO 2024096307 A1 WO2024096307 A1 WO 2024096307A1 KR 2023013928 W KR2023013928 W KR 2023013928W WO 2024096307 A1 WO2024096307 A1 WO 2024096307A1
Authority
WO
WIPO (PCT)
Prior art keywords
medical
sequence
diagnosis
data
diagnostic
Prior art date
Application number
PCT/KR2023/013928
Other languages
English (en)
Korean (ko)
Inventor
김영학
전태준
김민경
강희준
안임진
권한슬
김윤하
서혜람
조하나
최희정
한지예
기가은
박서현
Original Assignee
재단법인 아산사회복지재단
울산대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 재단법인 아산사회복지재단, 울산대학교 산학협력단 filed Critical 재단법인 아산사회복지재단
Publication of WO2024096307A1 publication Critical patent/WO2024096307A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • EMR electronic medical records
  • data such as diagnosis codes, which are important for describing a patient's condition and other characteristics, are often not in the easiest or simplest format for analysis. Due to this, interest in applying an effective representation method to efficiently utilize medical information in the form of code in artificial intelligence (AI)-based prediction research is increasing.
  • AI artificial intelligence
  • OOE One-hot Encoding
  • NLP Natural Language Processing
  • W2V skip-gram
  • An operating electronic device for a medical artificial intelligence model includes a memory storing computer-executable instructions; and a processor that accesses the memory and executes the instructions, wherein the instructions are sorted according to sorting criteria from a plurality of diagnosis codes included in the patient's electronic medical record (EMR, electronic medical record). Based on this, a diagnosis code sequence is generated, and the generated diagnosis code sequence is applied to a medical data preprocessing model to obtain converted sequence data with a dimension reduced from that of the diagnosis code sequence. And, by operating the medical artificial intelligence model based on the input of the converted sequence data to the medical artificial intelligence model, predicted diagnostic data for the patient can be generated.
  • EMR electronic medical record
  • the processor obtains a plurality of partial sequences by dividing the electronic medical record into time periods having a predetermined length, and for each of the plurality of partial sequences, one part included in each partial sequence.
  • the above diagnostic codes can be sorted.
  • the processor may delete overlapping diagnostic codes among diagnostic codes included in the partial sequence from the partial sequence.
  • the processor may sort one or more diagnostic codes included in the partial sequence based on the frequency of the diagnostic codes recorded in the electronic medical record.
  • the processor may obtain the converted sequence data from a transfer layer of the medical data preprocessing model.
  • the processor may fine tune the medical data preprocessing model based on diagnosis codes sorted by the frequency of the diagnosis codes at predetermined periods.
  • FIG. 1 is a diagram illustrating an electronic device that operates a medical artificial intelligence model according to an embodiment.
  • Figure 2 is a flowchart showing a method of operating a medical artificial intelligence model according to an embodiment.
  • FIG. 3 is a diagram illustrating a diagnosis code in an electronic medical record (EMR) according to an embodiment.
  • FIG. 4 is a diagram illustrating a method of obtaining converted sequence data by applying a diagnosis code sequence to a medical data preprocessing model according to an embodiment.
  • FIG. 5 is a diagram illustrating a method of generating a plurality of diagnostic code sequences according to a sorting method according to an embodiment.
  • FIG. 6 is a diagram illustrating a method of generating a medical data preprocessing model by fine-tuning the BERT model according to an embodiment.
  • FIG. 7 is a diagram illustrating application of sequence data generated by one-hot encoding (OHE) to a medical artificial intelligence model according to an embodiment.
  • OOE one-hot encoding
  • FIG. 8 is a diagram illustrating application of sequence data converted by a medical data preprocessing model to a medical artificial intelligence model according to an embodiment.
  • FIG. 9 is a diagram illustrating the performance of a medical data preprocessing model according to a method for sorting diagnosis code sequences according to an embodiment.
  • FIG. 10 is a diagram illustrating the performance of one-hot encoding (OHE) and a medical data preprocessing model according to an embodiment.
  • first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component.
  • a first component may be named a second component, and similarly, the second component may also be named a first component.
  • a or B “at least one of A and B”, “at least one of A or B”, “A, B or C”, “at least one of A, B and C”, and “A Each of phrases such as “at least one of , B, or C” may include any one of the items listed together in the corresponding phrase, or any possible combination thereof.
  • FIG. 1 is a diagram illustrating an electronic device that operates a medical artificial intelligence model according to an embodiment.
  • the electronic device 100 may include a processor 110 and a memory 120.
  • the electronic device 100 may be a device that generates predicted diagnostic data for a patient's electronic medical record (EMR).
  • EMR electronic medical record
  • the electronic device 100 may generate a diagnosis code sequence by sorting diagnosis codes included in the electronic medical record.
  • the electronic device 100 may obtain converted sequence data by applying the diagnosis code sequence to a medical data preprocessing model.
  • the electronic device 100 may apply the converted sequence data to a medical artificial intelligence model to generate predicted diagnosis data for the patient's electronic medical record.
  • the processor 110 may, for example, execute software to control at least one other component (e.g., hardware or software component) of the electronic device 100 connected to the processor 110, and process various data. Alternatively, calculations can be performed. According to one embodiment, as at least part of data processing or computation for operating a medical artificial intelligence model, the processor 110 stores instructions or data in the memory 120, and stores the instructions or data stored in the memory 120. Processing is performed, and the resulting data can be stored in the memory 120.
  • the processor 110 stores instructions or data in the memory 120, and stores the instructions or data stored in the memory 120. Processing is performed, and the resulting data can be stored in the memory 120.
  • Memory 120 may include instructions executable by a computer.
  • the memory 120 may temporarily and/or permanently store various data and/or information required to operate at least one of a medical data preprocessing model and/or a medical artificial intelligence model.
  • the memory 120 may store at least one of a patient's electronic medical record, a diagnosis code sequence, converted sequence data, or diagnosis data.
  • Figure 2 is a flowchart showing a method of operating a medical artificial intelligence model according to an embodiment.
  • an electronic device may generate a diagnostic code sequence.
  • the diagnostic code sequence may be an ordered sequence of diagnostic codes.
  • the diagnosis code may be a code included in the patient's electronic medical record that indicates the patient's diagnosis result.
  • the electronic device can generate a partial sequence by sorting diagnosis codes in an electronic medical record.
  • the electronic device can generate a diagnostic code sequence by combining partial sequences.
  • Electronic medical records may be records that input, manage, and store medical records prepared by medical personnel, such as medical records, as electronic documents with electronic signatures.
  • Diagnosis codes may represent the International Statistical Classification of Diseases and Related Health Problems (ICD), including the patient's death, disease, injury, or health condition.
  • Diagnosis codes may include codes written in parallel with alphabets and numbers.
  • the diagnosis code may include codes divided into alphabets and numbers by classifying each patient's disease into 21 chapters according to the source of infection, anatomical region, and other factors.
  • the partial sequence may include diagnosis codes sorted by period. For example, partial sequences may be divided into time periods with a predetermined time length, and a partial sequence corresponding to a certain period may include diagnostic codes belonging to that period.
  • the electronic device may generate partial sequences by dividing the entire period from the visit date (INDT) to the discharge date (OUDT) into 1-year units in the patient's electronic medical record.
  • the electronic device may add a predetermined parameter to each partial sequence.
  • the electronic device may generate a diagnostic code sequence by combining partial sequences to which a parameter (eg, a delimiter that separates partial sequences) is added. A method of generating a diagnostic code sequence will be described later in FIG. 5.
  • the electronic device may obtain converted sequence data by applying the diagnosis code sequence to the medical data preprocessing model.
  • the electronic device may obtain converted sequence data from the transfer layer of the medical data preprocessing model.
  • a description of the transfer layer is provided later in FIG. 4.
  • the medical data preprocessing model may be a pre-trained BERT (bidirectional encoder representations from transformer) language expression model that performs bidirectional encoder representations from a transformer.
  • the electronic device can create a medical data preprocessing model by fine-tuning a pre-trained BERT model. While the pre-trained BERT model is specialized for natural language processing, it may not be suitable for processing unstructured data such as diagnostic codes. Therefore, the electronic device can fine-tune the BERT model by applying diagnostic code sequences to the pre-trained BERT model. Electronic devices can use the fine-tuned BERT model as a medical data preprocessing model. For reference, a method of fine-tuning the BERT model to create a medical data preprocessing model is described later in FIG. 6.
  • the converted sequence data may be data that abstracts the relationship between a plurality of diagnosis codes existing in the diagnosis code sequence.
  • the converted sequence data may include feature vectors in which the features of the diagnosis code are expressed as numbers. A description of the converted sequence data will be provided later in FIG. 8.
  • the electronic device may operate the medical artificial intelligence model based on input of converted sequence data to the medical artificial intelligence model.
  • a medical artificial intelligence model may be a machine learning model that generates predicted diagnostic data for a patient based on input of converted sequence data.
  • Medical artificial intelligence models can be created through machine learning. Learning algorithms may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but It is not limited.
  • a medical artificial intelligence model may include multiple artificial neural network layers.
  • Artificial neural networks include deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), restricted boltzmann machine (RBM), belief deep network (DBN), bidirectional recurrent deep neural network (BRDNN), It may be one of deep Q-networks or a combination of two or more of the above, but is not limited to the examples described above.
  • the medical artificial intelligence model is described as a machine learning model for convenience of explanation, but it is not limited to this.
  • a medical artificial intelligence model may be a classification model composed of a decision tree to predict diagnostic data.
  • the medical artificial intelligence model may be an extreme gradient boosting (XGB) model.
  • a medical artificial intelligence model may include one or more decision trees.
  • a medical artificial intelligence model can ensemble the results of one or more decision trees.
  • a medical artificial intelligence model may ensemble the results of one or more decision trees to generate predicted diagnostic data for a patient.
  • Diagnostic data includes diagnostic data regarding the possibility of discharge of an inpatient, diagnostic data regarding the possibility of death during hospitalization of a heart failure patient, and a patient's reading result sheet (e.g., X-ray reading result sheet, CT reading result) It may include at least one of diagnostic data related to a search for a similar reading result sheet (e.g., MRI reading result sheet, or ultrasound reading result sheet), or diagnostic data relating to a treatment guide for a sepsis patient.
  • MRI reading result sheet e.g., MRI reading result sheet, or ultrasound reading result sheet
  • FIG. 3 is a diagram illustrating a diagnosis code in an electronic medical record (EMR) according to an embodiment.
  • the electronic medical record 300 may include at least one of the patient's personal information, the patient's hospital visit type, clinical data, or a diagnosis code 310.
  • the patient's personal information may include the patient's gender, patient's date of birth, and hospital visit ID.
  • the patient's hospital visit type may include visit types based on outpatient treatment and inpatient treatment.
  • Diagnosis code 310 may include multiple classifications.
  • diagnosis codes H00 to H59 may include information related to diseases of the eye and eye appendages.
  • Diagnosis codes H25 through H28 may include information related to disorders of the lens.
  • H25 contains information related to senile cataracts, and H25.8 may contain information related to other senile cataracts.
  • the electronic medical record 300 may include a diagnosis code 310 indicating disease information about senile cataracts corresponding to the 'H25' code.
  • the 10th International Classification of Diseases (ICD-10) will be mainly described as the diagnosis code 310, but it is not limited thereto, and the diagnosis code 310 is the 11th International Classification of Diseases (ICD).
  • the electronic device e.g., the electronic device 100 in FIG. 1
  • the electronic device has a diagnostic code excluding the decimal number of the diagnostic code 310 (e.g., 8 in the diagnostic code of H25.8). Sequences can be created.
  • FIG. 4 is a diagram illustrating a method of obtaining converted sequence data by applying a diagnosis code sequence to a medical data preprocessing model according to an embodiment.
  • An electronic device may perform first preprocessing 435 and second preprocessing 455.
  • the electronic device may obtain the diagnosis code sequence 430 through first preprocessing 435.
  • the electronic device may apply the diagnosis code sequence 430 to the medical data preprocessing model 440.
  • the medical data preprocessing model 440 may be a fine-tuned BERT model.
  • the electronic device may obtain the converted sequence data 460 through the second preprocessing 455.
  • the electronic device can apply the converted sequence data 460 to the medical artificial intelligence model 470.
  • the first preprocessing 435 may include generating a diagnosis code sequence 430 from the electronic medical record 400.
  • the electronic device may perform first preprocessing 435 to generate a diagnosis code sequence 430 including sorted diagnosis codes from the electronic medical record 400.
  • the diagnosis code sequence 430 may include a plurality of diagnosis codes 420 for the patient 410 .
  • Diagnostic code sequence 430 may include diagnostic codes sorted based on sorting criteria.
  • the electronic device may apply the diagnosis code sequence 430 obtained through the first preprocessing 435 to the medical data preprocessing model 440.
  • the electronic device may apply the diagnosis code sequence 430 to the medical data preprocessing model 440 to obtain at least one of the mask code probability 446 or the converted sequence data 460.
  • the electronic device can measure accuracy by applying the diagnosis code sequence 430 to the medical data preprocessing model 440.
  • the electronic device may apply the mask code sequence 442 to the medical data pre-processing model 440 to measure the accuracy of the medical data pre-processing model 440.
  • the mask code sequence 442 may represent a sequence in which an arbitrary diagnosis code in the diagnosis code sequence 430 is obscured by a mask token.
  • the electronic device may obtain the mask code probability 446 by applying the mask code sequence 442 to the medical data preprocessing model 440.
  • Mask code probability 446 may include prediction results for the diagnostic code masked by the mask token in mask code sequence 442.
  • the electronic device may obtain at least one prediction probability value among the prediction probability value of diagnosis code I10, the prediction probability value of I20, the prediction probability value of C22, or the prediction probability value of J45.
  • the electronic device can measure the accuracy of the medical data preprocessing model 440 based on the mask code probability 446.
  • the second preprocessing 455 may include generating converted sequence data 460 by applying the diagnosis code sequence 430 to the medical data preprocessing model 440.
  • the electronic device may perform second preprocessing 455 to obtain converted sequence data 460 by applying the diagnosis code sequence 430 to the medical data preprocessing model 440.
  • the electronic device may acquire sequence data 460 converted from the transfer layer 450 of the medical data preprocessing model 440.
  • the transfer layer 450 may be at least one layer among one or more layers included in the medical data preprocessing model 440.
  • the medical data preprocessing model 440 may include a transformer block and a classification block.
  • the transformer block may be a block that outputs an intermediate output (e.g., converted sequence data 460) for input data (e.g., mask code sequence 442) of the medical data preprocessing model 440.
  • the classification block may be a block that outputs a prediction result (e.g., mask code probability 446) for input data (e.g., mask code sequence 442) of the medical data preprocessing model 440.
  • the electronic device may obtain converted sequence data 460 from the transfer layer 450 of the transformer block.
  • the converted sequence data 460 may include data converted into a numeric vector that abstracts the relationship between clinical information and diseases of the plurality of diagnosis codes 420 present in the diagnosis code sequence 430.
  • FIG. 5 is a diagram illustrating a method of generating a plurality of diagnostic code sequences according to a sorting method according to an embodiment.
  • An electronic device may generate a diagnostic code sequence 530 from the first code set 510 and the second code set 520.
  • the electronic device may generate the second code set 520 by removing overlapping diagnostic codes from the first code set 510 of the patient 500.
  • the first code set 510 may include diagnosis codes divided by one year based on the date of visit of the patient 500.
  • the first code set 520 may include one or more diagnosis codes per year based on the date of visit of the patient 500.
  • the second code set 520 may include diagnostic codes in which overlapping diagnostic codes are removed from the first code set 510 on a one-year basis. Additionally, the second code set 520 may include diagnosis codes in which decimal numbers of the diagnosis code have been removed.
  • the electronic device may generate a partial sequence including a plurality of diagnostic codes on a yearly basis from the second code set 520. That is, the partial sequence may include a plurality of diagnosis codes that the patient 500 was diagnosed with during one year. For example, referring to FIG. 5, the partial sequence corresponding to year 1 in the second code set 520 may include the diagnosis code 'J45'. Additionally, the partial sequence corresponding to the second year in the second code set 520 may include diagnosis codes 'J45', 'N40', and 'I20'.
  • the electronic device can generate a partial sequence containing diagnostic codes for a period of one year.
  • the electronic device can generate the diagnosis code sequence 530 by combining the above-described partial sequences.
  • the electronic device may generate the diagnostic code sequence 530 by combining a plurality of partial sequences generated from the second code set 520.
  • the electronic device may generate a diagnosis code sequence 530 in which partial sequences are combined by adding a separator (e.g., a period ('.')) in units of one year in the second code set 520. .
  • a separator e.g., a period ('.')
  • the diagnosis code sequence 530 includes diagnosis codes corresponding to the first year (e.g., 'J45') to diagnosis codes corresponding to the fourth year (e.g., R60, Z85, J45, I20, etc.) may contain sequences combined by delimiters. However, in this specification, for convenience of explanation, the period will mainly be explained as a parameter to distinguish one year, but it is not limited thereto.
  • the electronic device may obtain a diagnosis code sequence in which diagnosis codes are sorted according to an alignment standard.
  • the sorting criteria may include at least one of random sorting, alphabetical order sorting, or frequency sorting.
  • Random sorting may be sorting diagnostic codes based on a random probability generated through random numbers.
  • Alphabetical ordering may be sorting according to the alphabet order included in the diagnosis code.
  • Frequency sorting may be sorting based on the number of occurrences of each diagnostic code in the second code set 520. For example, the electronic device may calculate the number of occurrences for all diagnostic codes present in the second code set 520. The electronic device may generate a partial sequence for a corresponding period by arranging a diagnostic code with a second frequency of occurrence greater than the first frequency of occurrence in advance of a diagnostic code with a first frequency of occurrence in each period. For example, the electronic device may sort diagnostic codes so that diagnostic codes with a greater number of occurrences are sorted in descending order.
  • the electronic device may generate a partial sequence containing diagnostic codes arranged sequentially for a certain period of time, from the diagnostic code with the greatest frequency of occurrence to the diagnostic code with the least frequency of occurrence. there is. If two or more of the diagnostic codes have the same frequency of occurrence, the electronic device may sort the diagnostic codes based on the order of occurrence. For example, the electronic device may place the diagnostic code that occurs first among diagnostic codes with the same frequency of occurrence as precedence.
  • the electronic device may align one or more diagnostic codes included in each partial sequence for each of the plurality of partial sequences.
  • the first diagnosis code sequence 540 may represent a diagnosis code sequence generated by combining partial sequences in which diagnosis codes are sorted according to a random sorting standard.
  • the second diagnosis code sequence 550 may represent a diagnosis code sequence generated by combining partial sequences in which diagnosis codes are sorted according to an alphabet ordering standard.
  • the third diagnosis code sequence 560 may represent a diagnosis code sequence generated by combining partial sequences in which diagnosis codes are sorted according to a frequency sorting criterion.
  • the electronic device may apply the first diagnosis code sequence 540 to the first medical data preprocessing model 542.
  • the electronic device may apply the second diagnosis code sequence 550 to the second medical data preprocessing model 552.
  • the electronic device may apply the third diagnosis code sequence 560 to the third medical data preprocessing model 562.
  • the first medical data preprocessing model 542 to the third medical data preprocessing model 562 may be a finely tuned BERT model.
  • the medical data preprocessing models e.g., the first medical data preprocessing model 542, the second medical data preprocessing model 552, and the third medical data preprocessing model 562) are separate models. Although shown, it is not limited to this.
  • a single medical data preprocessing model may be used repeatedly in operations that apply multiple diagnostic code sequences.
  • FIG. 6 is a diagram illustrating a method of generating a medical data preprocessing model by fine-tuning the BERT model according to an embodiment.
  • An electronic device may generate a medical data preprocessing model by fine-tuning (630) the pre-trained BERT model (620).
  • the electronic device may generate masking input 610 from a diagnostic code sequence 600.
  • the diagnosis code sequence 600 may be a sequence including diagnosis codes sorted based on frequency.
  • the masking input 610 may include diagnostic codes in which an arbitrary diagnostic code (for example, I40 in FIG. 6) in the diagnostic code sequence 600 is obscured by a mask token.
  • the electronic device may perform fine tuning (630) by applying the masking input (610) to the pre-trained BERT model (620).
  • the pre-trained BERT model 620 may be a model that does not have excellent prediction performance for diagnostic codes. Therefore, the electronic device creates a masking input 610 and performs fine-tuning 630 of the pre-trained BERT model 620, so that the prediction performance regarding medical data is improved compared to the model before fine-tuning 630.
  • a data preprocessing model can be obtained.
  • the above-described BERT model 620 includes a masking input 610 and a prediction output mapped to the masking input 610 (e.g., masking input 610). It can be fine-tuned based on data containing pairs of diagnostic code prediction probabilities in a form masked by a mask token. For example, BERT model 620 can be fine-tuned to output a prediction output from masking input 610. The BERT model 620 during fine tuning can generate a temporary output in response to the masking input 610 and can be fine-tuned to minimize loss between the temporary output and the fine tuning target.
  • parameters of the BERT model 620 may be updated depending on the loss. This fine adjustment may be performed, for example, in the electronic device itself where the BERT model 620 is performed, or may be performed through a separate server.
  • the fine-tuned BERT model e.g., medical data preprocessing model
  • FIG. 7 is a diagram illustrating application of sequence data generated by one-hot encoding (OHE) to a medical artificial intelligence model according to an embodiment.
  • OOE one-hot encoding
  • An electronic device converts the diagnostic code sequence 700 into sequence data 720 through one-hot encoding (OHE) 710.
  • OOE one-hot encoding
  • one-hot encoding 710 may be an encoding that generates a vector by adding the number '1' to the word to be expressed (eg, a diagnosis code).
  • the sequence data 720 includes the number of rows corresponding to the number of patients included in the diagnosis code sequence 700 and the number of columns corresponding to the number of diagnosis codes of the International Classification of Diseases. Can include numbers.
  • the sequence data 720 may include an element value of '1' for the index of the word to be expressed or the diagnosis code included in the diagnosis code sequence 700.
  • the electronic device may apply the sequence data 720 to the medical artificial intelligence model 730.
  • the sequence data 720 obtained by one-hot encoding 710 is the converted sequence data generated by the medical data preprocessing model (e.g., the converted sequence data 460 of FIG. 4) compared to the diagnostic code. It can be a matrix expressed only in terms of presence or absence.
  • FIG. 8 is a diagram illustrating application of sequence data converted by a medical data preprocessing model to a medical artificial intelligence model according to an embodiment.
  • An electronic device may acquire sequence data 830 converted from the transfer layer of the medical data preprocessing model 820.
  • the medical data preprocessing model 820 may be a BERT model fine-tuned through a sequence of diagnosis codes sorted based on the frequency 810 of the diagnosis codes.
  • the converted sequence data 830 may include a reduced dimension than the number of International Classification of Diseases diagnosis codes (e.g., the number of rows of the sequence data 720 of FIG. 7). You can.
  • the electronic device may apply the converted sequence data 830, in which the characteristics of the diagnosis code sequence 800 are reduced to a low dimension, to the medical artificial intelligence model 840.
  • the converted sequence data 830 may include data that abstracts the relationship between clinical information and diseases of the diagnosis codes of the diagnosis code sequence 800, compared to sequence data (e.g., sequence data 720 of FIG. 7).
  • sequence data e.g., sequence data 720 of FIG. 7
  • the electronic device can provide a positive influence on improving the performance of the medical artificial intelligence model 840 through the converted sequence data 830 described above.
  • FIG. 9 is a diagram illustrating the performance of a medical data preprocessing model according to a method for sorting diagnosis code sequences according to an embodiment.
  • An electronic device may obtain a predicted probability result by applying a diagnosis code sequence to a medical data preprocessing model.
  • the first medical data preprocessing model 900 to the third medical data preprocessing model 920 may be a fine-tuned BERT model.
  • the first medical data preprocessing model 900 may be a fine-tuned model based on diagnosis code sequences sorted by random sorting.
  • the second medical data preprocessing model 910 may be a fine-tuned model based on diagnosis code sequences sorted in alphabetical order.
  • the third medical data preprocessing model 920 may be a fine-tuned model based on diagnosis code sequences sorted by frequency sorting.
  • the prediction probability result may represent the result of a medical data preprocessing model predicting the correct diagnosis code of a mask token in an example of a diagnosis code sequence.
  • the electronic device can replace any diagnostic code with a mask token in the example diagnostic code sequence shown in FIG. 9.
  • the electronic device can obtain a predicted probability result by applying the diagnosis code sequence including the mask token to the medical data preprocessing model. For example, the electronic device may obtain a higher prediction probability result from the third medical data preprocessing model 920 than from the first medical data preprocessing model 900 and the second data preprocessing model 910.
  • FIG. 10 is a diagram illustrating the performance of one-hot encoding (OHE) and a medical data preprocessing model according to an embodiment.
  • An electronic device e.g., the electronic device 100 of FIG. 1 may obtain the ROC curve graph 1000.
  • the ROC curve graph 1000 may include an ROC curve.
  • the ROC curve can be a measure to evaluate results based on sensitivity and specificity.
  • the graph 1010 represents converted sequence data (e.g., converted sequence data 830 in FIG. 8) obtained through a medical data preprocessing model (e.g., medical data preprocessing model 820 in FIG. 8). , can be applied to a medical artificial intelligence model (e.g., medical artificial intelligence model 840 in FIG. 8) to represent the results of diagnostic data predicted for the patient.
  • the graph 1020 is a medical artificial intelligence model (e.g., sequence data 720 of FIG. 7) obtained through one-hot encoding (e.g., one-hot encoding 710 of FIG. 7). By applying it to the medical artificial intelligence model 730 of FIG. 7, the results of the predicted diagnostic data for the patient can be displayed.
  • results obtained using the medical data preprocessing model for diagnosis code G20 show better performance than the results obtained using one-hot encoding.
  • the results obtained using the medical data preprocessing model can show excellent performance in many diagnosis codes included in the International Classification of Diseases.
  • the embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components.
  • the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA).
  • ALU arithmetic logic unit
  • FPGA field programmable gate
  • It may be implemented using a general-purpose computer or a special-purpose computer, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions.
  • the processing device may execute an operating system (OS) and software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software.
  • OS operating system
  • a processing device may access, store, manipulate, process, and generate data in response to the execution of software.
  • a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include.
  • a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.
  • Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device.
  • Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave.
  • Software may be distributed over networked computer systems and stored or executed in a distributed manner.
  • Software and data may be stored on a computer-readable recording medium.
  • the method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium.
  • a computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination, and the program instructions recorded on the medium may be specially designed and constructed for the embodiment or may be known and available to those skilled in the art of computer software. It may be possible.
  • Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks.
  • Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.
  • the hardware devices described above may be configured to operate as one or multiple software modules to perform the operations of the embodiments, and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Pathology (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

Le dispositif électronique pour faire fonctionner un modèle d'intelligence artificielle médicale, selon un mode de réalisation, comprend : une mémoire pour stocker des instructions exécutables par ordinateur ; et un processeur qui accède à la mémoire et exécute les instructions, les instructions générant une séquence de codes de diagnostic sur la base de critères de tri à partir d'une pluralité de codes de diagnostic inclus dans un dossier médical électronique (EMR) d'un patient, appliquer la séquence de codes de diagnostic générée à un modèle de prétraitement de données médicales afin d'acquérir des données de séquence converties présentant une dimension inférieure à celle de la séquence de codes de diagnostic, et faire fonctionner le modèle d'intelligence artificielle médicale sur la base de l'application, à l'entrée du modèle d'intelligence artificielle médicale, des données de séquence converties afin de générer des données de diagnostic prédites concernant le patient.
PCT/KR2023/013928 2022-11-01 2023-09-15 Procédé de fonctionnement d'un modèle d'intelligence artificielle médicale, et dispositif électronique mettant en oeuvre celui-ci WO2024096307A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020220143646A KR20240067148A (ko) 2022-11-01 2022-11-01 의료 인공지능 모델 동작 방법 및 이를 수행하는 전자 장치
KR10-2022-0143646 2022-11-01

Publications (1)

Publication Number Publication Date
WO2024096307A1 true WO2024096307A1 (fr) 2024-05-10

Family

ID=90930725

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/013928 WO2024096307A1 (fr) 2022-11-01 2023-09-15 Procédé de fonctionnement d'un modèle d'intelligence artificielle médicale, et dispositif électronique mettant en oeuvre celui-ci

Country Status (2)

Country Link
KR (1) KR20240067148A (fr)
WO (1) WO2024096307A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190208354A1 (en) * 2007-07-03 2019-07-04 Eingot Llc Records access and management
KR20190123609A (ko) * 2018-04-24 2019-11-01 네이버 주식회사 딥 어텐션 네트워크를 이용하여 환자 의료 기록으로부터 질병 예후를 예측하는 방법 및 시스템
US20200043579A1 (en) * 2018-08-06 2020-02-06 David McEwing Diagnositic and treatmetnt tool and method for electronic recording and indexing patient encounters for allowing instant search of patient history
KR20210068713A (ko) * 2019-12-02 2021-06-10 주식회사 피디젠 딥러닝 기반 다중의료데이터를 통한 질병의 진행 예측 분석 시스템
KR20220011979A (ko) * 2020-07-22 2022-02-03 삼성전자주식회사 언어 모델 및 이를 포함하는 전자 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190208354A1 (en) * 2007-07-03 2019-07-04 Eingot Llc Records access and management
KR20190123609A (ko) * 2018-04-24 2019-11-01 네이버 주식회사 딥 어텐션 네트워크를 이용하여 환자 의료 기록으로부터 질병 예후를 예측하는 방법 및 시스템
US20200043579A1 (en) * 2018-08-06 2020-02-06 David McEwing Diagnositic and treatmetnt tool and method for electronic recording and indexing patient encounters for allowing instant search of patient history
KR20210068713A (ko) * 2019-12-02 2021-06-10 주식회사 피디젠 딥러닝 기반 다중의료데이터를 통한 질병의 진행 예측 분석 시스템
KR20220011979A (ko) * 2020-07-22 2022-02-03 삼성전자주식회사 언어 모델 및 이를 포함하는 전자 장치

Also Published As

Publication number Publication date
KR20240067148A (ko) 2024-05-16

Similar Documents

Publication Publication Date Title
Canayaz MH-COVIDNet: Diagnosis of COVID-19 using deep neural networks and meta-heuristic-based feature selection on X-ray images
Nguyen et al. $\mathtt {Deepr} $: a convolutional net for medical records
US20240203599A1 (en) Method and system of for predicting disease risk based on multimodal fusion
US20210034813A1 (en) Neural network model with evidence extraction
WO2019164064A1 (fr) Système d'interprétation d'image médicale par génération de données d'apprentissage renforcé d'intelligence artificielle perfectionnée, et procédé associé
Lee et al. Machine learning in relation to emergency medicine clinical and operational scenarios: an overview
US11244755B1 (en) Automatic generation of medical imaging reports based on fine grained finding labels
CN113688248B (zh) 一种小样本弱标注条件下的医疗事件识别方法及系统
Lyu et al. A multimodal transformer: Fusing clinical notes with structured EHR data for interpretable in-hospital mortality prediction
Kale et al. Causal phenotype discovery via deep networks
Nigam Applying deep learning to ICD-9 multi-label classification from medical records
CN110534185A (zh) 标注数据获取方法、分诊方法、装置、存储介质及设备
US11763081B2 (en) Extracting fine grain labels from medical imaging reports
Jin et al. Automatic detection of hypoglycemic events from the electronic health record notes of diabetes patients: empirical study
Yu et al. Rare disease detection by sequence modeling with generative adversarial networks
Waheeb et al. An efficient sentiment analysis based deep learning classification model to evaluate treatment quality
Nguyen et al. Deep bidirectional LSTM for disease classification supporting hospital admission based on pre-diagnosis: a case study in Vietnam
Funkner et al. Negation Detection for Clinical Text Mining in Russian.
WO2024096307A1 (fr) Procédé de fonctionnement d'un modèle d'intelligence artificielle médicale, et dispositif électronique mettant en oeuvre celui-ci
Wang et al. Deeptriager: a neural attention model for emergency triage with electronic health records
Zhang et al. Section classification in clinical notes with multi-task transformers
Valmianski et al. Evaluating robustness of language models for chief complaint extraction from patient-generated text
US11809826B2 (en) Assertion detection in multi-labelled clinical text using scope localization
Yogarajan Domain-specific language models for multi-label classification of medical text
Yuan et al. Numerical Feature Transformation-Based Sequence Generation Model for Multi-Disease Diagnosis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23886004

Country of ref document: EP

Kind code of ref document: A1