US20200027567A1 - Systems and Methods for Automatically Generating International Classification of Diseases Codes for a Patient Based on Machine Learning - Google Patents

Systems and Methods for Automatically Generating International Classification of Diseases Codes for a Patient Based on Machine Learning Download PDF

Info

Publication number
US20200027567A1
US20200027567A1 US16/207,119 US201816207119A US2020027567A1 US 20200027567 A1 US20200027567 A1 US 20200027567A1 US 201816207119 A US201816207119 A US 201816207119A US 2020027567 A1 US2020027567 A1 US 2020027567A1
Authority
US
United States
Prior art keywords
icd
diagnostic
codes
lstm
descriptions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/207,119
Inventor
Pengtao Xie
Eric Xing
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Petuum Inc
Original Assignee
Petuum Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Petuum Inc filed Critical Petuum Inc
Priority to US16/207,119 priority Critical patent/US20200027567A1/en
Assigned to PETUUM INC reassignment PETUUM INC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: XIE, Pengtao, XING, ERIC
Publication of US20200027567A1 publication Critical patent/US20200027567A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/274Syntactic or semantic context, e.g. balancing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present disclosure generally relates to machine learning for healthcare, and more particularly, to systems and methods that apply machine learning algorithms to diagnostic documents to automatically generate an international classification of diseases (ICD) code for a patient.
  • ICD international classification of diseases
  • EHR electronic health records
  • ML machine learning
  • An ML system can automatically analyze multiple sources of information with rich structure; uncover the medically meaningful hidden concepts from low-level records to aid medical professionals to easily and concisely understand the medical data; and create a compact set of informative diagnostic procedures and treatment courses and make healthcare recommendations thereupon.
  • ICD International Classification of Diseases
  • the ICD is a healthcare classification system maintained by the World Health Organization. It provides a hierarchy of diagnostic codes of diseases, disorders, injuries, signs, symptoms, etc. It is widely used for reporting diseases and health conditions, assisting in medical reimbursement decisions, collecting morbidity and mortality statistics, to name a few.
  • ICD codes are important for making clinical and financial decisions, medical coding—which assigns proper ICD codes to a patient visit is time-consuming, error-prone and expensive. Medical coders review the diagnosis descriptions written by physicians in the form of textual phrases and sentences and (if necessary) other information in the electronic medical record of a clinical episode, then manually attribute the appropriate ICD codes by following the coding guidelines. Several types of errors frequently occur. First, the ICD codes are organized in a hierarchical structure. For a node representing a disease C, the children of this node represents the subtypes of C. In many cases, the difference between disease subtypes is very subtle. It is common that human coders select incorrect subtypes.
  • diagnosis descriptions and the textual descriptions of ICD codes are written in quite different styles even if they refer to the same disease.
  • the textual description of an ICD code is formally and precisely worded, while diagnosis descriptions are usually written by physicians in an informal and ungrammatical way, with telegraphic phrases, abbreviations, and typos.
  • a method of assigning a set of international classification of diseases (ICD) codes to a patient includes obtaining a diagnostic description vector from at least one diagnostic description record of the patient; and applying a machine-learned ICD code assignment algorithm to the diagnostic description vector to assign a set of ICD codes to the patient.
  • ICD international classification of diseases
  • a system for assigning a set of ICD codes to a patient includes a diagnostic description encoding module and an ICD code assignment module.
  • the diagnostic description encoding module is configured to obtain a diagnostic description vector from at least one diagnostic description record of the patient.
  • the ICD code assignment module is configured to apply a machine-learned ICD code assignment algorithm to the diagnostic description vector to assign a set of ICD codes to the patient.
  • the processor is configured to generate representations of diagnostic descriptions in a form of diagnostic descriptions vectors, and to generate representations of ICD codes in a form of ICD vectors.
  • the processor is further configured to process the diagnostic descriptions vectors and the ICD vectors to obtain an importance score between each diagnostic description represented in a diagnostic description vector and each ICD represented in an ICD vector, and to associate each diagnostic description represented in the diagnostic description vector with one or more ICDs represented in the ICD vector based on the importance scores.
  • FIG. 1 is a block diagram of a system for generating an international classification of diseases (ICD) code for a patient using a machine-learned algorithm.
  • ICD international classification of diseases
  • FIG. 2 is a block diagram of a model used to develop and train the machine-learned algorithm of FIG. 1 .
  • FIG. 3 is a block diagram of the architectural layers and functionality of the model used to develop and train the machine-learned algorithm of FIG. 1 .
  • FIG. 4 is a block diagram of a computing device that embodies the system of FIG. 1 .
  • FIG. 5 is a block diagram of an apparatus that develops and trains the machine-learned algorithm of FIG. 1 .
  • the system comprises a neural architecture that automatically perform ICD coding based on an input corresponding to a patient's diagnosis descriptions.
  • the diagnosis descriptions may be input in the form of a physician's free-form writing.
  • the neural architecture has four aspects: First, the architecture uses a tree-of-sequences long short-term memory (LSTM) network to simultaneously capture the hierarchical relationship among ICD codes and the semantics of each ICD code. Second, the architecture utilizes an adversarial learning approach to reconcile the different writing styles of diagnosis descriptions and ICD code descriptions.
  • LSTM long short-term memory
  • the architecture utilizes isotonic constraints to preserve the importance order among codes, and an algorithm based on the alternating direction method of multipliers (ADMM) to solve the constrained problem.
  • the architecture employs an attentional matching mechanism to perform many-to-one and one-to-many mappings between diagnosis descriptions and ICD codes.
  • an ICD coding system 100 includes a diagnostic description encoding module 102 and an ICD code assignment module 104 .
  • the diagnostic description encoding module 102 is configured to receive a diagnostic description record 106 for a subject patient and to produce a representation of the record as an encoded diagnostic description (DD) vector 108 .
  • the ICD code assignment module 104 receives the diagnostic description as the diagnostic description (DD) vector 108 and applies a previously-trained machine-learned algorithm 110 to the vector.
  • the machine-learned algorithm 110 determines relevant ICD codes corresponding to the diagnostic descriptions included in the diagnostic description vector 108 and outputs an assigned ICD code 112 for the patient.
  • the diagnostic description encoding module 102 it is configured to extract information from the diagnostic description record 106 , and derive the encoded diagnostic description (DD) vector 108 from the extracted information.
  • the diagnostic description record 106 may be hand written diagnostic notes identifying one or more diagnoses of the patient.
  • the diagnostic description encoding module 102 employs a LSTM recurrent neural network (RNN) to encode the diagnosis descriptions.
  • RNN LSTM recurrent neural network
  • An example of such a recurrent network is described in Martin Sundermeyer, Ralf Schluter, and Hermann Ney. LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association, 2012, the disclosure of which is herein incorporated by reference.
  • the diagnostic description encoding module 102 employs a sequential long short-term memory (SLSTM) network to encode each description individually.
  • SLSTM sequential long short-term memory
  • the machine-learned algorithm 110 of the ICD code assignment module 104 of FIG. 1 is designed and trained using a model that consists of five modules: a diagnostic description encoding module 202 that encodes diagnostics descriptions, an ICD code description encoding module 204 that encodes ICD codes based on their textual descriptions, an adversarial reconciliation module 206 , an attentional matching module 208 (with an embedded isotonic constraints module 210 ) that matches diagnosis descriptions with ICD codes, and assigns the ICD codes.
  • the functional architecture of the model of FIG. 2 is shown in FIG. 3 .
  • the diagnostic description encoding module 202 used to develop and train the machine-learned algorithm 110 is configured to encode diagnostics descriptions in the same way as the diagnostic description encoding module 102 of the ICD coding system 100 .
  • the diagnostic description encoding module 202 is configured to receive diagnostic descriptions and generate a latent representation of the diagnostic descriptions in the form of an encoded DD vector 216 .
  • the diagnostic description encoding module 202 employs a LSTM recurrent neural network (RNN) to encode the diagnosis descriptions.
  • RNN LSTM recurrent neural network
  • An example of such a recurrent network is described in Martin Sundermeyer, Ralf Schluter, and Hermann Ney. LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association, 2012, the disclosure of which is herein incorporated by reference.
  • LSTM is a popular variant of the recurrent neural network. Due to the capacity of capturing long-range semantics in texts, LSTM is widely used for language modeling and sequence encoding.
  • An LSTM recurrent network consists of a sequence of units, each of which models one item in the input sequence.
  • character-level LSTM network and word-level LSTM network are used to obtain its hidden representation.
  • the reason for using character-aware encoding is: there are many medical terms with the same suffix denoting similar diseases and the character level LSTM captures such characteristics.
  • the hidden representations of the written diagnosis descriptions are denoted as h 1 ; . . . ; h m , where m is the number of diagnosis descriptions in one record.
  • the diagnostic description encoding module 202 employs a sequential long short-term memory (SLSTM) network to encode each description individually.
  • SLSTM sequential long short-term memory
  • the weight parameters of this SLSTM are tied with those of the SLSTM adopted by the ICD code description encoding module 204 for encoding ICD code descriptions, as described below.
  • the ICD code description encoding module 204 is configured to receive ICD codes 212 and generate a latent representation of the codes in the form of an encoded ICD vector 218 .
  • Each ICD code has a description (a sequence of words) that tells the semantics of this code.
  • the ICD code description encoding module 204 adopts the same two-level LSTM architecture, i.e., character-level and word-level, used for diagnostic description encoding, to obtain the hidden representation of its textual description.
  • the parameters of the neural networks for the ICD code description encoding module 204 and the diagnostic description encoding module 202 are not tied, in order to learn different language styles of these two sets of texts.
  • the hidden representations of different ICD codes are denoted as u 1 ; . . . ; u n where n is the total number of codes.
  • the ICD code description encoding module 204 employs a sequential LSTM (SLSTM) to encode this description.
  • SLSTM sequential LSTM
  • TLSTM tree-of-sequences LSTM
  • the inputs of TLSTM include the code hierarchy and the hidden states of individual codes produced by the SLSTMs. It consists of a bottom-up TLSTM and a top-down TLSTM, which produce two hidden states h ⁇ and h ⁇ at each node in the tree.
  • the ICD code description encoding module 204 takes the textual descriptions of the ICD codes 212 and their hierarchical structure as inputs and produces a latent representation for each code and includes the representation in an ICD vector 218 .
  • the representation aims at simultaneously capturing the semantics of each code and the hierarchical relationship among codes.
  • the model can avoid selecting codes that are subtypes of the same disease and promote the selection of codes that are clinically correlated.
  • each ICD code has a description (a sequence of words) that tells the semantics of this code.
  • the ICD code description encoding module 204 employs a sequential LSTM (SLSTM) to encode this description.
  • SLSTM sequential LSTM
  • TLSTM tree-of-sequences LSTM
  • the inputs of this LSTM include the code hierarchy and the hidden states of individual codes produced by the SLSTMs. It consists of a bottom-up TLSTM and a top-down TLSTM, which produce two hidden states h ⁇ and h ⁇ at each node in the tree.
  • Sequential LSTM A SLSTM network is a special type of recurrent neural network that (1) learns the latent representation (which usually reflects certain semantic information) of words; (2) models the sequential structure among words.
  • each word t is allocated with an SLSTM unit, which consists of the following components: input gate i t , forget gate f t , output gate o t , memory cell c t , and hidden state s t .
  • These components are computed as follows:
  • i t ⁇ ( W (i) s t-1 +U (i) x t +b (i) )
  • ⁇ t ⁇ ( W (o) s t-1 +U (o) x t +b (o) )
  • x t is the embedding vector of word t.
  • W, U are component-specific weight matrices and b are bias vectors.
  • Tree-of-sequences LSTM A bi-directional tree LSTM (TLSTM) captures the hierarchical relationships among code.
  • the inputs of this TLSTM include the code hierarchy and the hidden states of individual codes produced by the SLSTMs. It consists of a bottom-up TLSTM and a top-down TLSTM, which produce two hidden states h ⁇ and h ⁇ at each node in the tree.
  • the transition equations among components are:
  • s is the SLSTM hidden state that encodes the name of the concept C.
  • W, U, b are component-specific weight matrices and bias vectors.
  • SLSTM hidden state s For a leaf node having no children, its only input is the SLSTM hidden state s and no forget gates are needed.
  • the transition equations are:
  • top-down TLSTM for a non-root node, it has the following components: an input gate i ⁇ , a forget gate f ⁇ , an output gate o ⁇ , a memory cell c ⁇ , and a hidden state h ⁇ .
  • the transition equations are:
  • h ⁇ (p) and c ⁇ (p) are the top-down TLSTM hidden state and memory cell of the parent of this node.
  • h ⁇ cannot be computed using the above equations. Instead, we set h ⁇ to h ⁇ (the bottom-up TLSTM hidden state generated at the root node). h ⁇ captures the semantics of all codes, which is then propagated downwards to each individual code via the top-down TLSTM dynamics
  • the bottom-up TLSTM composes the semantics of children (representing sub-codes) and merges them into the current node, which hence captures child-to-parent relationship.
  • the top-down TLSTM makes each node inherit the semantics of its parent, which captures parent-to-child relation. As a result, the hierarchical relationship among codes is encoded in the hidden states.
  • diversity-promoting regularization may be leveraged to improve the performance of the ICD code description encoding module 204 .
  • Diversity-promoting regularization imposes a structural constraint on parameters of the ICD code description encoding module 204 , which reduces the model capacity and therefore improves generalization performance on unseen data.
  • overfitting may be alleviated using a diversity-promoting regularization in the form of 1) a uniform eigenvalue regularizer (UER) applied to an LSTM network, or 2) angular constraints.
  • a diversity-promoting regularization in the form of 1) a uniform eigenvalue regularizer (UER) applied to an LSTM network, or 2) angular constraints.
  • uncorrelation among components may be characterized from a statistical perspective by treating components as random variables and measuring their covariance which is proportional to their correlation.
  • a ⁇ d ⁇ m denotes the component matrix whose k-th column is the parameter vector a k of component k.
  • a row view of A: may be used where each component is treated as a random variable and each row vector ⁇ i T is a sample drawn from the random vector formed by the m components.
  • An empirical covariance matrix may then be computed with the components as:
  • A is a full rank matrix and m>d
  • G is a full-rank matrix with rank m.
  • an eigenvector u k of the covariance matrix G represents a principal direction of the data points and the associated eigenvalue ⁇ k tells the variability of points along that direction. The larger ⁇ k is, the more spread out the points along the direction u k .
  • the level of disparity among eigenvalues indicates the level of correlation among the m components (random variables). The more different the eigenvalues are, the higher the correlation is. Considering this, the uniformity among eigenvalues of G can be utilized to measure how uncorrelated the components are.
  • the eigenvalues are related with the other factor of diversity: evenness.
  • evenness is used to measure diversity.
  • each component is assigned an importance score. Since the eigenvectors are in parallel to the coordinate axis, the eigenvalues reflect the variance of components. Analogous to principle component analysis which posits that random variables with larger variance are more important, the present embodiment may use variance to measure importance. According to the evenness criteria, the components are more diverse if their importance scores match, which motivates us to encourage the eigenvalues to be uniform.
  • the eigenvalues are encouraged to be even in both cases: (1) when the eigenvectors are not aligned with the coordinate axis, they are preferred to be even to reduce the correlation of components; (2) when the eigenvectors are aligned with the coordinate axis, they are encouraged to be even such that different components contribute equally in modeling data.
  • eigenvalues may be normalized into a probability simplex and then the discrete distribution parameterized by the normalized eigenvalues may be encouraged to have small Kullback-Leibler (KL) divergence with the uniform distribution.
  • KL Kullback-Leibler
  • a T A may be set to be positive definite.
  • the distribution p (X) is set be “close” to a uniform distribution
  • log( ⁇ ) denotes matrix logarithm
  • UER then may be applied to promote diversity.
  • (A) denote the objective function of an ML model
  • a UE-regularized ML problem can be defined as
  • UER Compared with previous diversity-promoting regularizers, UER has the following benefits: (1) It measures the diversity of all components in a holistic way, rather than reducing to pairwise dissimilarities as other regularizers do. This enables UER to capture global relations among components. (2) Unlike determinant-based regularizers that are sensitive to vector scaling, UER is derived from normalized eigenvalues where the normalization effectively removes scaling. (3) UER is amenable for computation. First, unlike the decorrelation regularizer that is defined over data-dependent intermediate variables and thus incurs computational inefficiency, UER is directly defined on model parameters and is independent of data. Second, unlike the regularizers that are non-smooth, UER is a smooth function.
  • smooth functions are more amenable for deriving optimization algorithms than non-smooth functions.
  • the dominating computation in UER is matrix logarithm. It does not substantially increase computational overhead as long as the number of components is not too large (e.g., less than 1000).
  • the LSTM network is a type of recurrent neural network, that is better at capturing long-term dependency in sequential modeling.
  • the input is x t
  • there is an input gate i t there is an input gate i t , a forget gate f t , an output gate o t , a memory cell c t , and a hidden state h t .
  • the transition equations among them are:
  • i t ⁇ ( W (i) x t +U (i) h t-1 +b (i)
  • o t ⁇ ( W (o) x t +U (o) h t-1 +b (o)
  • the LSTM network is applied for doze-style reading comprehension (CSRC).
  • Near-orthogonality may be used to represent “diversity”, using a regularization approach-angular constraints (ACs) where the angle between components is constrained to be close to ⁇ /2 which hence encourages the components to be close to being orthogonal. Analysis shows that the closer to ⁇ /2 the angles are, the smaller the estimation error is and the larger the approximation error is. The best tradeoffs of these two errors can be explored by properly tuning the angles.
  • An algorithm based on the alternating direction method of multipliers (ADMM) solves the angle-constrained problems.
  • ACs Angular constraints
  • the writing styles of diagnostic descriptions (DDs) and code descriptions (CDs) are largely different, which makes the matching between a DD and a CD error-prone.
  • an adversarial learning approach reconciles the writing styles.
  • DD vectors 216 On top of the latent representation DD vectors 216 , a discriminative network is built to distinguish which inputs are DDs and which are CDs.
  • the diagnostic description encoding module 202 and the ICD code description encoding module 204 try to make such a discrimination impossible. By doing this, the learned representations are independent of the writing styles and facilitate more accurate matching.
  • an adversarial learning approach is used to reconcile the different writing styles of diagnosis descriptions and code descriptions.
  • the basic idea is: after encoded, if a description cannot be discerned to be a DD or a CD, then the difference in their writing styles is eliminated.
  • a discriminative network included in the adversarial reconciliation module 206 takes the encoding vector of a diagnosis description as input and tries to identify it as a DD or CD.
  • the diagnostic description encoding module 202 and the ICD code description encoding module 204 adjust their weight parameters so that such a discrimination is difficult to be achieved by the discriminative network.
  • the SLSTM encoding vectors of CDs are used as the input of the discriminative network rather than using the TLSTM encodings since the latter are irrelevant to writing styles.
  • Adversarial learning is performed by solving this problem:
  • the discriminative network tries to differentiate DDs from CDs by minimizing this classification loss while the encoder maximizes this loss so that DDs and CDs are not distinguishable.
  • the attentional matching module 208 is configured to map diagnostic descriptions to ICD codes.
  • the encoded DD vectors 216 and the encoded ICD vectors 218 are fed into the attentional matching module 208 to perform code assignments.
  • the attentional matching module 208 allows multiple diagnostic descriptions to be matched to a single code and allows a single diagnostic description to be matched to multiple codes.
  • An order of importance among codes is incorporated by the isotonic constraints module 210 . These constraints regulate the weight parameters of the model so that codes with higher importance are given larger prediction scores.
  • the attentional matching module 208 disclosed herein is configured to take all diagnosis descriptions into account during coding by adopting an attention strategy.
  • the attentional matching module 208 provides a recipe for choosing which diagnosis descriptions are important when performing coding. For the i-th ICD code, an importance score or attention score a i,j on the j -th diagnosis description is calculated as u i T h j .
  • the attentional matching module 208 may utilize these attention scores based on a hard selection mechanism or a soft attention mechanism.
  • the hard selection mechanism is based on the assumption that the most related diagnosis description plays a decisive role when assigning ICD codes.
  • the dominating diagnosis is defined as the one that has the maximum attention score among all diagnosis descriptions.
  • a soft-attention mechanism may be used to calculate an attention score or importance score between a diagnostic description and a plurality of ICD codes.
  • An example of such a mechanism is described in Dzmitry Bandanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2014, the disclosure of which is herein incorporated by reference.
  • the soft-attention mechanism applies a softmax function to normalize the attention scores among all diagnosis descriptions into a probability simplex.
  • the normalized attention scores are utilized as the weights of different diagnosis descriptions.
  • the weighted average over the hidden representations of different diagnosis descriptions is used as the attentional hidden vector. In this way, the attentional hidden vector can take into account all diagnosis descriptions with varying levels of attention.
  • the mapping from diagnostic descriptions to codes is not one-to-one.
  • a code is assigned only when a certain combination of K (1 ⁇ K ⁇ M) diseases simultaneously appear within the M diagnostic descriptions and the value of K depends on this code.
  • K diseases their importance of determining the assignment of this code is different. For the rest M-K diagnostic descriptions, their importance is considered to be zero.
  • ⁇ nm u n T h m .
  • the weight parameters ⁇ of the model are trained using the data of L patient visits.
  • includes the SLSTM weights W s , TLSTM weights W t , and weights W p in the final prediction layer.
  • can be learned by minimizing the following prediction loss:
  • p n (l) is the predicted probability that code n is assigned to patient visit l and p n (l) is a function of ⁇ .
  • CE( ⁇ , ⁇ ) is the cross-entropy loss.
  • FIG. 4 is a block diagram of a computing device 400 that embodies the ICD coding system 100 of FIG. 1 .
  • the computing device 400 is specially configured to execute instructions related to the ICD code assignment process described above, including the application of machine-learned algorithms to diagnostic description records.
  • Computers capable of being specially configured to execute such instructions may be in the form of a laptop, desktop, workstation, or other appropriate computers.
  • the computing device 400 includes a central processing unit (CPU) 402 , a memory 404 , e.g., random access memory, and a computer readable media 406 that stores program instructions that enable the CPU and memory to implement the functions of the diagnostic description encoding module 102 and the ICD code assignment module 104 of the ICD coding system 100 described above with reference to FIG. 1 .
  • the computing device 400 also includes a user interface 408 and a display 410 , and an interface bus 412 that interconnects all components of the computing device.
  • Computer readable media 406 is suitable for storing ICD code system processing instructions include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, flash memory devices, magnetic disks, magneto optical disks and CD ROM and DVD-ROM disks.
  • the CPU 402 and memory 404 executes the ICD coding system processing instructions stored in the computer readable media 406 to thereby perform the functions of the diagnostic description encoding module 102 and the ICD code assignment module 104 .
  • the user interface 408 which may be a keyboard or a mouse, and the display 410 allow for a clinician to interface with the computing device 400 .
  • a clinician seeking to obtain a set of ICD codes for a subject patient may input a diagnostic description record of a subject patient for processing.
  • the clinician may then initiate execution of the ICD coding system processing instructions stored in the computer readable media 406 through the user interface 408 , and await a display of the predicted medications.
  • FIG. 5 is a schematic block diagram of an apparatus 500 .
  • the apparatus 500 may correspond to one or more processors configured to develop and train the machine-learned algorithm included in the ICD coding system of FIG. 1 .
  • the apparatus 500 may be embodied in any number of processor-driven devices, including, but not limited to, a server computer, a personal computer, one or more networked computing devices, an application-specific circuit, a minicomputer, a microcontroller, and/or any other processor-based device and/or combination of devices.
  • the apparatus 500 may include one or more processing units 502 configured to access and execute computer-executable instructions stored in at least one memory 504 .
  • the processing unit 502 may be implemented as appropriate in hardware, software, firmware, or combinations thereof.
  • Software or firmware implementations of the processing unit 502 may include computer-executable or machine-executable instructions written in any suitable programming language to perform the various functions described herein.
  • the processing unit 502 may include, without limitation, a central processing unit (CPU), a digital signal processor (DSP), a reduced instruction set computer (RISC) processor, a complex instruction set computer (CISC) processor, a microprocessor, a microcontroller, a field programmable gate array (FPGA), a System-on-a-Chip (SOC), or any combination thereof.
  • CPU central processing unit
  • DSP digital signal processor
  • RISC reduced instruction set computer
  • CISC complex instruction set computer
  • microprocessor a microcontroller
  • FPGA field programmable gate array
  • SOC System-on
  • the apparatus 500 may also include a chipset (not shown) for controlling communications between the processing unit 502 and one or more of the other components of the apparatus 500 .
  • the processing unit 502 may also include one or more application-specific integrated circuits (ASICs) or application-specific standard products (ASSPs) for handling specific data processing functions or tasks.
  • ASICs application-specific integrated circuits
  • ASSPs application-specific standard products
  • the memory 504 may include, but is not limited to, random access memory (RAM), flash RAM, magnetic media storage, optical media storage, and so forth.
  • the memory 504 may include volatile memory configured to store information when supplied with power and/or non-volatile memory configured to store information even when not supplied with power.
  • the memory 504 may store various program modules, application programs, and so forth that may include computer-executable instructions that upon execution by the processing unit 502 may cause various operations to be performed.
  • the memory 504 may further store a variety of data manipulated and/or generated during execution of computer-executable instructions by the processing unit 502 .
  • the apparatus 500 may further include one or more interfaces 506 that may facilitate communication between the apparatus and one or more other apparatuses.
  • the interface 506 may be configured to receive records of diagnostic descriptions and records of ICD code descriptions.
  • Communication may be implemented using any suitable communications standard.
  • a LAN interface may implement protocols and/or algorithms that comply with various communication standards of the Institute of Electrical and Electronics Engineers (IEEE), such as IEEE 802.11, while a cellular network interface implement protocols and/or algorithms that comply with various communication standards of the Third Generation Partnership Project (3GPP) and 3GPP2, such as 3G and 4G (Long Term Evolution), and of the Next Generation Mobile Networks (NGMN) Alliance, such as 5G.
  • 3GPP Third Generation Partnership Project
  • 3GPP2 Third Generation Partnership Project 2
  • 3G and 4G Long Term Evolution
  • NVMN Next Generation Mobile Networks
  • the memory 504 may store various program modules, application programs, and so forth that may include computer-executable instructions that upon execution by the processing unit 502 may cause various operations to be performed.
  • the memory 504 may include an operating system module (O/S) 508 that may be configured to manage hardware resources such as the interface 506 and provide various services to applications executing on the apparatus 500 .
  • O/S operating system module
  • the memory 504 stores additional program modules such as: (1) a DD encoding module that receives diagnostic descriptions and generates latent representations of the diagnostic descriptions in the form of an encoded DD vectors; (2) an ICD encoding module 512 that receives ICD codes and generates latent representations of the codes in the form of an encoded ICD vectors; (3) an adversarial reconciliation module 514 that reconciles the different writing styles of diagnostic descriptions and ICD code descriptions; (4) an attention matching module that maps diagnostic descriptions to ICD codes; and (5) an isotonic constraints module 518 that establishes an order of importance for ICD codes.
  • Each of these modules includes computer-executable instructions that when executed by the processing unit 502 cause various operations to be performed, such as the operations described above.
  • the apparatus 500 and modules disclosed herein may be implemented in hardware or software that is executed on a hardware platform.
  • the hardware or hardware platform may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic component, discrete gate or transistor logic, discrete hardware components, or any combination thereof, or any other suitable component designed to perform the functions described herein.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • a processor may also be implemented as a combination of computing components, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, or any other such configuration.
  • Each patient visit is assigned with a list of ICD codes, ranked in descending order of importance and relevance. For each visit, the number of codes is usually not equal to the number of diagnosis descriptions.
  • These ground truth codes serve as the labels to train our coding model.
  • the entire dataset contains 6,984 unique codes, each of which has a textual description, de-scribing a disease, symptom, or condition.
  • the codes are organized into a hierarchy where the top-level codes correspond to general diseases while the bottom-level ones represent specific dis-eases. In the code tree, children of a node represent subtypes of a disease. Table 1 below shows the diagnosis descriptions of a patient visit and the assigned ICD codes. Inside the parentheses are the descriptions of the codes. The codes are ranked according to descending importance.
  • baselines for holistic comparison On the first panel are baselines for holistic comparison. On the second panel are baselines compared in the ablation study of tree-of-sequences LSTM for capturing hierarchical relationship. On the third panel are baselines compared in the ablation study of adversarial learning for writing-style reconciliation, isotonic constraints for ranking, and attentional matching.
  • Tree-of-sequences LSTM To evaluate this module, we compared with the two configurations: (1) No-TLSTM, which removes the tree LSTM and directly uses the hidden states produced by the sequential LSTM as the final representations of codes; (2) Bottom-up TLSTM, which re-moves the hidden states generated by the top-down TLSTM.
  • No-TLSTM which removes the tree LSTM and directly uses the hidden states produced by the sequential LSTM as the final representations of codes
  • Bottom-up TLSTM which re-moves the hidden states generated by the top-down TLSTM.
  • four hierarchical classification baselines including (1) hierarchical network (HierNet), (2) HybridNet, (3) branch network (BranchNet), (4) label embedding tree (LET), by using them to replace the bidirectional tree LSTM while keeping other modules untouched.
  • Table 2 shows the average sensitivity and specificity scores achieved by these methods on the test set. We make the following observations.
  • No-TLSTM ignores the hierarchical relationship among codes.
  • the bottom-up tree LSTM alone performs less well than the bidirectional tree LSTM. This demonstrates the necessity of the top-down TLSTM, which ensures every two codes are connected by directed paths and can more expressively capture code-relations in the hierarchy.
  • our method outperforms the four baselines. The possible reason is our method directly builds codes' hierarchical relation-ship into their representations while the baselines learn representations and capture hierarchical relationships separately.
  • No-TLSTM assigns the following codes: 462 (subacute sclerosing panencephalitis), 790.29 (other abnormal glucose), 799.9 (unspecified viral infection), and 285.21 (anemia in chronic kidney disease).
  • the first three are the ground truth and the fourth one is incorrect (the ground truth is 401.9 (unspecified essential hypertension)).
  • Adding tree LSTM fixes this error.
  • the average distance between 401.9 and the rest of ground truth codes is 6.2.
  • For the incorrectly assigned code 285.21, such a distance is 7.9. This demonstrates that tree LSTM is able to capture another constraint imposed by the hierarchy: codes with smaller tree-distance are more likely to be assigned together.
  • Adversarial learning To evaluate the efficacy of adversarial learning (AL), we remove it from the full model and refer to this baseline as No-AL. Specifically, in Eq. 13, the loss term max w d ( ⁇ adv W s , W e )) is taken away. Table 2 shows the results, from which we observe that after AL is removed, the sensitivity and specificity are dropped from 0.29 and 0.33 to 0.26 and 0.31 respectively. No-AL does not reconcile different writing styles of diagnosis descriptions (DDs) and code descriptions (CDs). As a result, a DD and a CD that have similar semantics may be mismatched because their writing styles are different.
  • DDs diagnosis descriptions
  • CDs code descriptions
  • a patient (admission ID 147583) has a DD ‘h/o DVT on anticoagulation’, which contains abbreviation DVT (deep vein thrombosis). Due to the presence of this abbreviation, it is difficult to assign a proper code to this DD since the textual descriptions of codes do not contain abbreviations. With adversarial learning, our model can correctly map this DD to a ground truth code: 443.9 (peripheral vascular disease, unspecified). Without AL, this code is not selected. As another example, a DD ‘coronary artery disease, STEMI, s/p 2 stents placed in RCA’ was given to patient 148532.
  • This DD is written informally and ungrammatically, and contains too much detailed information, e.g., ‘s/p 2 stents placed in RCA’. Such a writing style is quite different from that of CDs.
  • AL our model successfully matches this DD to a ground truth code: 414.01 (coronary atherosclerosis of native coronary artery). On the contrary, No-AL fails to achieve this.
  • Isotonic constraint To evaluate this ingredient, we remove the ICs from Eq. 13 during training and denote this baseline as No-IC.
  • NDCG We used NDCG to measure the ranking performance, which is calculated in the following way.
  • a testing patient-visit l where the ground truth ICD codes are (l) .
  • the relevance score of c to l 0 if c ⁇ (l) . and as
  • AM can correctly perform the many-to-one mapping from multiple DDs to a CD.
  • patient 190236 was given two DDs: ‘renal insufficiency’ and ‘acute renal failure’.
  • AM maps them to a combined ICD code: 403.91 (hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease), which is in the ground truth provided by medical coders.
  • No-AM fails to assign this code.
  • AM is able to correctly map a DD to multiple CDs.
  • a DD ‘congestive heart failure, diastolic’ was given to patient 140851.
  • AM successfully maps this DD to two codes: (1) 428.0 (congestive heart failure, unspecified); (2) 428.30 (diastolic heart failure, unspecified). Without AM, this DD is mapped only to 428.0
  • Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
  • the software may reside on a computer-readable medium.
  • a computer-readable medium may include, by way of example, a magnetic storage device (e.g., hard disk, floppy disk, magnetic strip), an optical disk (e.g., compact disk (CD), digital versatile disk (DVD)), a smart card, a flash memory device (e.g., card, stick, key drive), random access memory (RAM), read only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), a general register, or any other suitable non-transitory medium for storing software.
  • a magnetic storage device e.g., hard disk, floppy disk, magnetic strip
  • an optical disk e.g., compact disk (CD), digital versatile disk (DVD)
  • a smart card e.g., a flash memory device (e.g., card, stick, key drive), random access memory (RAM), read only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically era
  • module and “engine” as used herein, refers to software, firmware, hardware, and any combination of these elements for performing the associated functions described herein. Additionally, for purpose of discussion, the various modules are described as discrete modules; however, as would be apparent to one of ordinary skill in the art, two or more modules may be combined to form a single module that performs the associated functions according embodiments of the invention.
  • computer program product may be used generally to refer to media such as, memory storage devices, or storage unit. These, and other forms of computer-readable media, may be involved in storing one or more instructions for use by processor to cause the processor to perform specified operations. Such instructions, generally referred to as “computer program code” (which may be grouped in the form of computer programs or other groupings), when executed, enable the computing system.
  • memory or other storage may be employed in embodiments of the invention.
  • memory or other storage may be employed in embodiments of the invention.
  • any suitable distribution of functionality between different functional units, processing logic elements or domains may be used without detracting from the invention.
  • functionality illustrated to be performed by separate processing logic elements or controllers may be performed by the same processing logic element or controller.
  • references to specific functional units are only to be seen as references to suitable means for providing the described functionality, rather than indicative of a strict logical or physical structure or organization.

Abstract

A system for automatically assigning a set of ICD codes to a patient includes a diagnostic description encoding module and an ICD code assignment module. The diagnostic description encoding module is configured obtain a diagnostic description vector from at least one diagnostic description record of the patient. The diagnostic description record may be in the form of hand-written physician notes. The ICD code assignment module is configured to apply a machine-learned ICD code assignment algorithm to the diagnostic description vector to assign a set of ICD codes to the patient. When multiple codes are assigned, the machine-learned ICD code assignment algorithm establishes an order of importance for the ICD codes.

Description

    CROSS-REFERENCE TO RELATED APPLICATION(S)
  • This application claims the benefit of and priority to 1) U.S. Provisional Patent Application Ser. No. 62/699,385, filed Jul. 17, 2018, for “Diversity-Promoting and Large-Scale Machine Learning for Healthcare”, and 2) U.S. Provisional Patent Application Ser. No. 62/756,024, filed Nov. 5, 2018, for “Diversity-Promoting and Large-Scale Machine Learning for Healthcare”, the entire disclosures of which are incorporated herein by references.
  • This application has subject matter in common with: 1) U.S. patent application Ser. No. 16/038,895, filed Jul. 18, 2018, for “A Machine Learning System for Measuring Patient Similarity”, 2) U.S. patent application Ser. No. 15/946,482, filed Apr. 5, 2018, for “A Machine Learning System for Disease, Patient, and Drug Co-Embedding, and Multi-Drug Recommendation”, 3) U.S. patent application Ser. No. ______, filed ______, for “Systems and Methods for Predicting Medications to Prescribe to a Patient Based on Machine Learning”, 4) U.S. patent application Ser. No. ______, filed ______, for “Systems and Methods for Medical Topic Discovery Based on Large-Scale Machine Learning”, 5) U.S. patent application Ser. No. ______filed ______, for “Systems and Methods for Automatically Tagging Concepts to, and Generating Text Reports for, Medical Images Based on Machine Learning”, the entire disclosures of which are incorporated herein by reference, and the entire disclosures of which are incorporated herein by reference.
  • TECHNICAL FIELD
  • The present disclosure generally relates to machine learning for healthcare, and more particularly, to systems and methods that apply machine learning algorithms to diagnostic documents to automatically generate an international classification of diseases (ICD) code for a patient.
  • BACKGROUND
  • With the widespread adoption of electronic health records (EHR) systems, and the rapid development of new technologies such as high-throughput medical imaging devices, low-cost genome profiling systems, networked and even wearable sensors, mobile applications, and rich accumulation of medical knowledge/discoveries in databases, a tsunami of medical and healthcare data has emerged. It was estimated that 153 exabytes (one exabyte equals one billion gigabytes) of healthcare data were produced in 2013. In 2020, an estimated 2314 exabytes will be produced. From 2013 to 2020, an overall rate of increase is at least 48 percent annually.
  • In addition to the sheer volume, the complexity of healthcare data is also overwhelming. Such data includes clinical notes, medical images, lab values, vital signs, etc., coming from multiple heterogeneous modalities including texts, images, tabular data, time series, graph and so on. The rich clinical data is becoming an increasingly important source of holistic and detailed information for both healthcare providers and receivers. Collectively analyzing and digesting these rich information generated from multiple sources; uncovering the health implications, risk factors, and mechanisms underlying the heterogeneous and noisy data records at both individual patient and whole population levels; making clinical decisions including diagnosis, triage, and treatment thereupon, are now routine activities expected to be conducted by medical professionals including physicians, nurses, pharmacists and so on.
  • As the amount and complexity of medical data are rapidly growing, these activities are becoming increasingly more difficult for human experts. The information overload makes medical analytics and decisions-making time consuming, error-prone, suboptimal, and less-transparent. As a result, physicians, patients, and hospitals suffer a number of pain points, quality-wise and efficiency-wise. For example, in terms of quality, 250,000 Americans die each year from medical errors, which has become the third leading cause of death in the United States. Twelve million Americans are misdiagnosed each year. Preventable medication errors impact more than 7 million patients and cost almost $21 billion annually. Fifteen to twenty-five percent of patients are readmitted within 30 days and readmissions are costly (e.g., $41.3 billion in 2011). In terms of inefficiency, patients wait on average 6 hours in emergency rooms. Nearly 400,000 patients wait 24 hours or more. Physicians spend only 27 percent of their office day on direct clinical face time with patients. The U.S. healthcare system wastes $750 billion annually due to unnecessary services, inefficient care delivery, excess administrative costs, etc.
  • The advancement of machine learning (ML) technology opens up opportunities for next generation computer-aided medical data analysis and data-driven clinical decision making, where machine learning algorithms and systems can be developed to automatically and collectively digest massive medical data such as electronic health records, images, behavioral data, and the genome, to make data-driven and intelligent diagnostic predictions. An ML system can automatically analyze multiple sources of information with rich structure; uncover the medically meaningful hidden concepts from low-level records to aid medical professionals to easily and concisely understand the medical data; and create a compact set of informative diagnostic procedures and treatment courses and make healthcare recommendations thereupon.
  • It is therefore desirable to leverage the power of machine learning in automatically distilling insights from large-scale heterogeneous data for automatic smart data-driven medical predictions, recommendations, and decision-making, to assist physicians and hospitals in improving the quality and efficiency of healthcare. It is further desirable to have machine learning algorithms and systems that turn the raw clinical data into actionable insights for clinical applications. One such clinical application relates to assigning International Classification of Diseases (ICD) coding.
  • When applying machine learning to healthcare application, several fundamental issues may arise, including:
  • 1) How to better capture infrequent patterns: At the core of ML-based healthcare is to discover the latent patterns (e.g., topics in clinical notes, disease subtypes, phenotypes) underlying the observed clinical data. Under many circumstances, the frequency of patterns is highly imbalanced. Some patterns have very high frequency while others occur less frequently. Existing ML models lack the capability of capturing infrequent patterns. Known convolutional neural network do not perform well on infrequent patterns. Such a deficiency of existing models possibly results from the design of their objective function used for training. For example, a maximum likelihood estimator would reward itself by modeling the frequent patterns well as they are the major contributors to the likelihood function. On the other hand, infrequent patterns contribute much less to the likelihood, thereby it is not very rewarding to model them well and they tend to be ignored. Infrequent patterns are of crucial importance in clinical settings. For example, many infrequent diseases are life-threatening. It is critical to capture them.
  • 2) How to alleviate overfitting: In certain clinical applications, the number of medical records available for training is limited. For example, when training a diagnostic model for an infrequent disease, typically there is no access to a sufficiently large number of patient cases due to the rareness of this disease. Under such circumstances, overfitting easily happens, wherein the trained model works well on the training data but generalizes poorly on unseen patients. It is critical to alleviate overfitting.
  • 3) How to improve interpretability: Being interpretable and transparent is a must for an ML model to be willingly used by human physicians. Oftentimes, the patterns extracted by existing ML methods have a lot of redundancy and overlap, which are ambiguous and difficult to interpret. For example, in computational phenotyping from EHRs, it is observed that the learned phenotypes by the standard matrix and tensor factorization algorithms have much overlap, causing confusion such as two similar treatment plans are learned for the same type of disease. It is necessary to make the learned patterns distinct and interpretable.
  • 4) How to compress model size without sacrificing modeling power: In clinical practice, making a timely decision is crucial for improving patient outcome. To achieve time efficiency, the size (specifically, the number of weight parameters) of ML models needs to be kept small. However, reducing the model size, which accordingly reduces the capacity and expressivity of this model, typically sacrifice modeling power and performance. It is technically appealing but challenging to compress model size without losing performance.
  • 5) How to efficiently learn large-scale models: In certain healthcare applications, both the model size and data size are large, incurring substantial computation overhead that exceeds the capacity of a single machine. It is necessary to design and build distributed systems to efficiently train such models.
  • The ICD is a healthcare classification system maintained by the World Health Organization. It provides a hierarchy of diagnostic codes of diseases, disorders, injuries, signs, symptoms, etc. It is widely used for reporting diseases and health conditions, assisting in medical reimbursement decisions, collecting morbidity and mortality statistics, to name a few.
  • While ICD codes are important for making clinical and financial decisions, medical coding—which assigns proper ICD codes to a patient visit is time-consuming, error-prone and expensive. Medical coders review the diagnosis descriptions written by physicians in the form of textual phrases and sentences and (if necessary) other information in the electronic medical record of a clinical episode, then manually attribute the appropriate ICD codes by following the coding guidelines. Several types of errors frequently occur. First, the ICD codes are organized in a hierarchical structure. For a node representing a disease C, the children of this node represents the subtypes of C. In many cases, the difference between disease subtypes is very subtle. It is common that human coders select incorrect subtypes. Second, when writing diagnosis descriptions, physicians often utilize abbreviations and synonyms, which causes ambiguity and imprecision when the coders are matching ICD codes to those descriptions. Third, in many cases, several diagnosis descriptions are closely related and should be mapped to a single ICD code. However, unexperienced coders may code each disease separately. Such errors are called unbundling. The cost incurred by coding errors and the financial investment spent on improving coding quality are estimated to be $25 billion per year in the United States.
  • To reduce coding errors and cost, it is desirable to build an ICD coding model which automatically and accurately translates the free-text diagnosis descriptions into ICD codes. To achieve this goal, several technical challenges need to be addressed.
  • First, there exists a hierarchical structure among the ICD codes. This hierarchy can be leveraged to improve coding accuracy. On one hand, if code A and B are both children of C, then it is unlikely to simultaneously assign A and B to a patient. On the other hand, if the distance between A and B in the code tree is smaller than that between A and C and we know A is the correct code, then B is more likely to be a correct code than C, since codes with smaller distance are more clinically relevant. How to explore this hierarchical structure for better coding is technically demanding.
  • Second, the diagnosis descriptions and the textual descriptions of ICD codes are written in quite different styles even if they refer to the same disease. In particular, the textual description of an ICD code is formally and precisely worded, while diagnosis descriptions are usually written by physicians in an informal and ungrammatical way, with telegraphic phrases, abbreviations, and typos.
  • Third, it is required that the assigned ICD codes are ranked according to their relevance to the patient. How to correctly determine this order is technically nontrivial.
  • Fourth, as stated earlier, there does not necessarily exist a one-to-one mapping between diagnosis descriptions and ICD codes, and human coders should consider the overall health condition when assigning codes. In many cases, two closely related diagnosis descriptions need to be mapped onto a single combination ICD code. On the other hand, physicians may write two health conditions into one diagnosis description which should be mapped onto two ICD codes under such circumstances.
  • SUMMARY
  • In one aspect of the disclosure, a method of assigning a set of international classification of diseases (ICD) codes to a patient includes obtaining a diagnostic description vector from at least one diagnostic description record of the patient; and applying a machine-learned ICD code assignment algorithm to the diagnostic description vector to assign a set of ICD codes to the patient.
  • In another aspect of the disclosure, a system for assigning a set of ICD codes to a patient includes a diagnostic description encoding module and an ICD code assignment module. The diagnostic description encoding module is configured to obtain a diagnostic description vector from at least one diagnostic description record of the patient. The ICD code assignment module is configured to apply a machine-learned ICD code assignment algorithm to the diagnostic description vector to assign a set of ICD codes to the patient.
  • In another aspect of the disclosure, a machine learning apparatus for generating a map between diagnostic descriptions and ICD codes includes a processor and a memory coupled to the processor. The processor is configured to generate representations of diagnostic descriptions in a form of diagnostic descriptions vectors, and to generate representations of ICD codes in a form of ICD vectors. The processor is further configured to process the diagnostic descriptions vectors and the ICD vectors to obtain an importance score between each diagnostic description represented in a diagnostic description vector and each ICD represented in an ICD vector, and to associate each diagnostic description represented in the diagnostic description vector with one or more ICDs represented in the ICD vector based on the importance scores.
  • It is understood that other aspects of methods and systems will become readily apparent to those skilled in the art from the following detailed description, wherein various aspects are shown and described by way of illustration.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Various aspects of apparatuses and methods will now be presented in the detailed description by way of example, and not by way of limitation, with reference to the accompanying drawings, wherein:
  • FIG. 1 is a block diagram of a system for generating an international classification of diseases (ICD) code for a patient using a machine-learned algorithm.
  • FIG. 2 is a block diagram of a model used to develop and train the machine-learned algorithm of FIG. 1.
  • FIG. 3 is a block diagram of the architectural layers and functionality of the model used to develop and train the machine-learned algorithm of FIG. 1.
  • FIG. 4 is a block diagram of a computing device that embodies the system of FIG. 1.
  • FIG. 5 is a block diagram of an apparatus that develops and trains the machine-learned algorithm of FIG. 1.
  • DETAILED DESCRIPTION
  • Disclosed herein is a system for automatically assigning an international classification of diseases (ICD) code for a patient using a machine-learned algorithm. The system comprises a neural architecture that automatically perform ICD coding based on an input corresponding to a patient's diagnosis descriptions. The diagnosis descriptions may be input in the form of a physician's free-form writing. The neural architecture has four aspects: First, the architecture uses a tree-of-sequences long short-term memory (LSTM) network to simultaneously capture the hierarchical relationship among ICD codes and the semantics of each ICD code. Second, the architecture utilizes an adversarial learning approach to reconcile the different writing styles of diagnosis descriptions and ICD code descriptions. Third, the architecture utilizes isotonic constraints to preserve the importance order among codes, and an algorithm based on the alternating direction method of multipliers (ADMM) to solve the constrained problem. Fourth, the architecture employs an attentional matching mechanism to perform many-to-one and one-to-many mappings between diagnosis descriptions and ICD codes. Some of the concepts and features described herein are included in Diversity-promoting and Large-scale Machine Learning for Healthcare, a thesis submitted by Pengtao Xie in August 2018 to the Machine Learning Department, School of Computer Science, Carnegie Mellon University, which is hereby incorporated by reference in its entirety.
  • With reference to FIG. 1, in one configuration, an ICD coding system 100 includes a diagnostic description encoding module 102 and an ICD code assignment module 104. The diagnostic description encoding module 102 is configured to receive a diagnostic description record 106 for a subject patient and to produce a representation of the record as an encoded diagnostic description (DD) vector 108. The ICD code assignment module 104 receives the diagnostic description as the diagnostic description (DD) vector 108 and applies a previously-trained machine-learned algorithm 110 to the vector. The machine-learned algorithm 110 determines relevant ICD codes corresponding to the diagnostic descriptions included in the diagnostic description vector 108 and outputs an assigned ICD code 112 for the patient.
  • Regarding the diagnostic description encoding module 102, it is configured to extract information from the diagnostic description record 106, and derive the encoded diagnostic description (DD) vector 108 from the extracted information. The diagnostic description record 106 may be hand written diagnostic notes identifying one or more diagnoses of the patient. In one embodiment, the diagnostic description encoding module 102 employs a LSTM recurrent neural network (RNN) to encode the diagnosis descriptions. An example of such a recurrent network is described in Martin Sundermeyer, Ralf Schluter, and Hermann Ney. LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association, 2012, the disclosure of which is herein incorporated by reference. In another embodiment, the diagnostic description encoding module 102 employs a sequential long short-term memory (SLSTM) network to encode each description individually.
  • With reference to FIG. 2, the machine-learned algorithm 110 of the ICD code assignment module 104 of FIG. 1 is designed and trained using a model that consists of five modules: a diagnostic description encoding module 202 that encodes diagnostics descriptions, an ICD code description encoding module 204 that encodes ICD codes based on their textual descriptions, an adversarial reconciliation module 206, an attentional matching module 208 (with an embedded isotonic constraints module 210) that matches diagnosis descriptions with ICD codes, and assigns the ICD codes. The functional architecture of the model of FIG. 2 is shown in FIG. 3. The diagnostic description encoding module 202 used to develop and train the machine-learned algorithm 110 is configured to encode diagnostics descriptions in the same way as the diagnostic description encoding module 102 of the ICD coding system 100.
  • Diagnostic Description Encoding Module
  • The diagnostic description encoding module 202 is configured to receive diagnostic descriptions and generate a latent representation of the diagnostic descriptions in the form of an encoded DD vector 216.
  • In one embodiment, the diagnostic description encoding module 202 employs a LSTM recurrent neural network (RNN) to encode the diagnosis descriptions. An example of such a recurrent network is described in Martin Sundermeyer, Ralf Schluter, and Hermann Ney. LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association, 2012, the disclosure of which is herein incorporated by reference.
  • LSTM is a popular variant of the recurrent neural network. Due to the capacity of capturing long-range semantics in texts, LSTM is widely used for language modeling and sequence encoding. An LSTM recurrent network consists of a sequence of units, each of which models one item in the input sequence. For each diagnosis description, both character-level LSTM network and word-level LSTM network are used to obtain its hidden representation. The reason for using character-aware encoding is: there are many medical terms with the same suffix denoting similar diseases and the character level LSTM captures such characteristics. With reference to FIG. 3, the hidden representations of the written diagnosis descriptions are denoted as h1; . . . ; hm, where m is the number of diagnosis descriptions in one record.
  • In another embodiment, the diagnostic description encoding module 202 employs a sequential long short-term memory (SLSTM) network to encode each description individually. The weight parameters of this SLSTM are tied with those of the SLSTM adopted by the ICD code description encoding module 204 for encoding ICD code descriptions, as described below.
  • ICD Code Description Encoding Module
  • The ICD code description encoding module 204 is configured to receive ICD codes 212 and generate a latent representation of the codes in the form of an encoded ICD vector 218. Each ICD code has a description (a sequence of words) that tells the semantics of this code.
  • In one embodiment, the ICD code description encoding module 204 adopts the same two-level LSTM architecture, i.e., character-level and word-level, used for diagnostic description encoding, to obtain the hidden representation of its textual description. The parameters of the neural networks for the ICD code description encoding module 204 and the diagnostic description encoding module 202 are not tied, in order to learn different language styles of these two sets of texts. With reference to FIG. 3, the hidden representations of different ICD codes are denoted as u1; . . . ; un where n is the total number of codes.
  • In another configuration, the ICD code description encoding module 204 employs a sequential LSTM (SLSTM) to encode this description. To capture the hierarchical relationship among codes, a tree-of-sequences LSTM (TLSTM) is built along the code tree. The inputs of TLSTM include the code hierarchy and the hidden states of individual codes produced by the SLSTMs. It consists of a bottom-up TLSTM and a top-down TLSTM, which produce two hidden states h↑ and h↓ at each node in the tree. A more detailed description of a TLSTM is presented below.
  • In either configuration, the ICD code description encoding module 204 takes the textual descriptions of the ICD codes 212 and their hierarchical structure as inputs and produces a latent representation for each code and includes the representation in an ICD vector 218. The representation aims at simultaneously capturing the semantics of each code and the hierarchical relationship among codes. By incorporating the code hierarchy, the model can avoid selecting codes that are subtypes of the same disease and promote the selection of codes that are clinically correlated.
  • Tree-of-Sequences Long Short-Term Memory (LSTM) Network
  • As mention above, each ICD code has a description (a sequence of words) that tells the semantics of this code. The ICD code description encoding module 204 employs a sequential LSTM (SLSTM) to encode this description. To capture the hierarchical relationship among codes, a tree-of-sequences LSTM (TLSTM) is built along the code tree. The inputs of this LSTM include the code hierarchy and the hidden states of individual codes produced by the SLSTMs. It consists of a bottom-up TLSTM and a top-down TLSTM, which produce two hidden states h↑ and h↓ at each node in the tree.
  • Sequential LSTM: A SLSTM network is a special type of recurrent neural network that (1) learns the latent representation (which usually reflects certain semantic information) of words; (2) models the sequential structure among words. In the word sequence, each word t is allocated with an SLSTM unit, which consists of the following components: input gate it, forget gate ft, output gate ot, memory cell ct, and hidden state st. These components (vectors) are computed as follows:

  • i t=σ(W (i) s t-1 +U (i) x t +b (i))

  • f t=σ(W (i) s t-1 +U (f) x t +b (f))

  • σt=σ(W (o) s t-1 +U (o) x t +b (o))

  • c t =i t⊙ tan h(W (c) s t-1 +U (c) x t +b (c))+f t ⊙c t-1

  • s t =o t⊙ tan h(c t),  (Eq. 1)
  • where xt is the embedding vector of word t. W, U are component-specific weight matrices and b are bias vectors.
  • Tree-of-sequences LSTM: A bi-directional tree LSTM (TLSTM) captures the hierarchical relationships among code. The inputs of this TLSTM include the code hierarchy and the hidden states of individual codes produced by the SLSTMs. It consists of a bottom-up TLSTM and a top-down TLSTM, which produce two hidden states h↑ and h↓ at each node in the tree.
  • In the bottom-up TLSTM, an internal node (representing a code C, having M children) is comprised of the following components: an input gate ii, an output gate o↑, a memory cell c↑, a hidden state h, and M child-specific forget gates {f (m)}m=1 M where f (m) corresponds to the m-th child. The transition equations among components are:

  • i =σ(Σm=1 M(W (i,m) h (m) +U (i) s+b (i))

  • m,f (m)=σ(Σm=1 M(W (f,m) h (m) +U (f,m) +b (f,m))

  • O =σ(Σm=1 M(W (o,m) h (m) +U (o) s+b (o))

  • u =tan hm=1 M(W (u,m) h (m) +U (u) s+b (u))

  • c =i ⊙u m=1 M f (m) ⊙c (m)

  • h =o ⊙ tan h(c ),  (Eq. 2)
  • where s is the SLSTM hidden state that encodes the name of the concept C. {h (m))}m=1 M and {c (m)}m=1 M are the bottom-up TLSTM hidden states and memory cells of the children.
  • W, U, b are component-specific weight matrices and bias vectors. For a leaf node having no children, its only input is the SLSTM hidden state s and no forget gates are needed. The transition equations are:

  • i =σ(U (i) s+b (i))

  • o =σ(U (o) s+b (o))

  • u =tan h(U (u) s+b (u))

  • c =i ⊙u

  • h =o ⊙ tan hc   (Eq. 3)
  • In the top-down TLSTM, for a non-root node, it has the following components: an input gate i, a forget gate f, an output gate o, a memory cell c, and a hidden state h. The transition equations are:

  • i =σ(W (i) h (p) +b (i))

  • f =σ(W (f) h (p) +b (f))

  • o =σ(W (o) h (p) +b (o))

  • u =tan h(W (u) h (p) +b (u))

  • c =i ⊙u +f ⊙c (p)

  • h =o ⊙ tan h(c ),  (Eq. 4)
  • where h (p) and c (p) are the top-down TLSTM hidden state and memory cell of the parent of this node. For the root node which has no parent, h cannot be computed using the above equations. Instead, we set h to h (the bottom-up TLSTM hidden state generated at the root node). h captures the semantics of all codes, which is then propagated downwards to each individual code via the top-down TLSTM dynamics
  • The hidden states of the two directions are concatenated to obtain the bidirectional TLSTM encoding of each concept h=[h; h]. The bottom-up TLSTM composes the semantics of children (representing sub-codes) and merges them into the current node, which hence captures child-to-parent relationship. The top-down TLSTM makes each node inherit the semantics of its parent, which captures parent-to-child relation. As a result, the hierarchical relationship among codes is encoded in the hidden states.
  • To address the issue of overfitting described above in the background section, diversity-promoting regularization may be leveraged to improve the performance of the ICD code description encoding module 204. Diversity-promoting regularization imposes a structural constraint on parameters of the ICD code description encoding module 204, which reduces the model capacity and therefore improves generalization performance on unseen data.
  • In accordance with embodiments disclosed herein, overfitting may be alleviated using a diversity-promoting regularization in the form of 1) a uniform eigenvalue regularizer (UER) applied to an LSTM network, or 2) angular constraints.
  • Uniform Eigenvalue Regularizer
  • In some embodiments, uncorrelation among components may be characterized from a statistical perspective by treating components as random variables and measuring their covariance which is proportional to their correlation. In one embodiment, A∈
    Figure US20200027567A1-20200123-P00001
    d×m denotes the component matrix whose k-th column is the parameter vector ak of component k. In some embodiments, a row view of A: may be used where each component is treated as a random variable and each row vector ãi T is a sample drawn from the random vector formed by the m components. Further,
  • μ = 1 d i = 1 d a ~ i = 1 d A T 1
  • may be set as the sample mean, where the elements of 1∈
    Figure US20200027567A1-20200123-P00001
    d are all 1. An empirical covariance matrix may then be computed with the components as:
  • G = 1 d i = 1 d ( a ~ i - μ ) ( a ~ i - μ ) T = 1 d A T A - ( 1 d A T 1 ) ( 1 d A T 1 ) T . ( Eq . 5 )
  • By imposing the constraint AT1=0, therefore
  • G = 1 d A T A .
  • Suppose A is a full rank matrix and m>d, then G is a full-rank matrix with rank m.
  • For the next step, the eigenvalues of G play important roles in characterizing the uncorrelation and evenness of components. Let G=Σk=1 mλkukuk T be the eigendecomposition where λk is an eigenvalue and uk is the associated eigenvector. In principle component analysis, an eigenvector uk of the covariance matrix G represents a principal direction of the data points and the associated eigenvalue λk tells the variability of points along that direction. The larger λk is, the more spread out the points along the direction uk. When the eigenvectors (principal directions) are not aligned with the coordinate axis, the level of disparity among eigenvalues indicates the level of correlation among the m components (random variables). The more different the eigenvalues are, the higher the correlation is. Considering this, the uniformity among eigenvalues of G can be utilized to measure how uncorrelated the components are.
  • Secondly, the eigenvalues are related with the other factor of diversity: evenness. When the eigenvectors are aligned with the coordinate axis, the components are uncorrelated. In this case, evenness is used to measure diversity. In this example, each component is assigned an importance score. Since the eigenvectors are in parallel to the coordinate axis, the eigenvalues reflect the variance of components. Analogous to principle component analysis which posits that random variables with larger variance are more important, the present embodiment may use variance to measure importance. According to the evenness criteria, the components are more diverse if their importance scores match, which motivates us to encourage the eigenvalues to be uniform.
  • To sum up, the eigenvalues are encouraged to be even in both cases: (1) when the eigenvectors are not aligned with the coordinate axis, they are preferred to be even to reduce the correlation of components; (2) when the eigenvectors are aligned with the coordinate axis, they are encouraged to be even such that different components contribute equally in modeling data.
  • In some embodiments, to promote uniformity among eigenvalues, as a general approach, eigenvalues may be normalized into a probability simplex and then the discrete distribution parameterized by the normalized eigenvalues may be encouraged to have small Kullback-Leibler (KL) divergence with the uniform distribution. Given the eigenvalues {λk}k=1 m, they are then normalized into a probability simplex
  • λ ^ k = λ k j = 1 m λ k
  • based on which a distribution is defined on a discrete random variable X=1, . . . , m where p(X=k)={circumflex over (λ)}k.
  • In addition, to ensure the eigenvalues are strictly positive, ATA may be set to be positive definite. To encourage {λk}k=1 m to be uniform, the distribution p (X) is set be “close” to a uniform distribution
  • q ( X = k ) = 1 m ,
  • where the “closeness” is measured using KL divergence
  • KL ( p  q ) : k = 1 m λ ^ k log λ ^ k 1 / m = k = 1 m λ k log λ k j = 1 m λ j - log j = 1 m λ j + log m . ( Eq . 6 )
  • In this equation, Σk=1 m λk log λk is equivalent to
  • tr ( ( 1 d A T A ) log ( 1 d A T A ) ) ,
  • where log(⋅) denotes matrix logarithm. To show this, note that
  • log ( 1 d A T A ) = k = 1 m log ( λ k ) u k u k T ,
  • according to the property of matrix logarithm. Then,
  • tr ( ( 1 d A T A ) log ( 1 d A T A ) )
  • is equal to tr((Σk=1λkukuk T)(Σk=1 m log(λ)k ukuk T)) which equals to Σk=1 m λk log λk. According to the property of trace,
  • tr ( 1 d A T A ) = k = 1 m λ k .
  • Then the KL divergence can be turned into a diversity-promoting uniform eigenvalue regularizer (UER):
  • tr ( ( 1 d A T A ) log ( 1 d A T A ) ) tr ( 1 d A T A ) - log tr ( 1 d A T A ) , ( Eq . 7 )
  • subject to ATA
    Figure US20200027567A1-20200123-P00002
    0 and AT1=0.
  • UER then may be applied to promote diversity. For example, let
    Figure US20200027567A1-20200123-P00003
    (A) denote the objective function of an ML model, then a UE-regularized ML problem can be defined as
  • min A ( A ) + λ ( tr ( ( 1 d A T A ) log ( 1 d A T A ) ) tr ( 1 d A T A ) - log tr ( 1 d A T A ) )
  • subject to ATA
    Figure US20200027567A1-20200123-P00002
    0 and AT1=0, where λ is the regularization parameter. In principle, the “closeness” between p and q can be measured by other distances such as the total variation distance, Hellinger distance, etc. However, the resultant formula defined on the eigenvalues (like the one in Eq. 6) is very difficult (if possible) to be transformed into a formula defined on A (like the one in Eq. 7). Consequently, it is very challenging to perform estimation of A. In light of this, we choose to use the KL divergence.
  • Compared with previous diversity-promoting regularizers, UER has the following benefits: (1) It measures the diversity of all components in a holistic way, rather than reducing to pairwise dissimilarities as other regularizers do. This enables UER to capture global relations among components. (2) Unlike determinant-based regularizers that are sensitive to vector scaling, UER is derived from normalized eigenvalues where the normalization effectively removes scaling. (3) UER is amenable for computation. First, unlike the decorrelation regularizer that is defined over data-dependent intermediate variables and thus incurs computational inefficiency, UER is directly defined on model parameters and is independent of data. Second, unlike the regularizers that are non-smooth, UER is a smooth function. In general, smooth functions are more amenable for deriving optimization algorithms than non-smooth functions. The dominating computation in UER is matrix logarithm. It does not substantially increase computational overhead as long as the number of components is not too large (e.g., less than 1000).
  • Uniform Eigenvalue Regularized LSTM
  • The LSTM network is a type of recurrent neural network, that is better at capturing long-term dependency in sequential modeling. At each time step t where the input is xt, there is an input gate it, a forget gate ft, an output gate ot, a memory cell ct, and a hidden state ht. The transition equations among them are:

  • i t=σ(W (i) x t +U (i) h t-1 +b (i)

  • f t=σ(W (f) x t +U (f) h t-1 +b (f))

  • o t=σ(W (o) x t +U (o) h t-1 +b (o)

  • c t =i t⊙ tan h(W (c) x t +U (c) h t-1 +b (c))+f t ⊙c t-1

  • h t =o t⊙ tan h(c t),  (Eq. 8)
  • where
    Figure US20200027567A1-20200123-P00004
    ={W(s)|s∈S={i, f, o, c}} and
    Figure US20200027567A1-20200123-P00005
    ={U(s)|s∈S} are gate-specific weight matrices and
    Figure US20200027567A1-20200123-P00006
    ={b(s)|s∈S} are bias vectors. The row vectors in W and U are treated as components. Let
    Figure US20200027567A1-20200123-P00007
    denote the loss function of an LSTM network and
    Figure US20200027567A1-20200123-P00008
    (⋅) denote the UER (including constraints), then a UE-regularized LSTM problem can be defined as:

  • Figure US20200027567A1-20200123-P00009
    (
    Figure US20200027567A1-20200123-P00010
    )+λ
    Figure US20200027567A1-20200123-P00011
    (
    Figure US20200027567A1-20200123-P00008
    W (s)+
    Figure US20200027567A1-20200123-P00008
    (U (s)))  (Eq. 9)
  • The LSTM network is applied for doze-style reading comprehension (CSRC).
  • Angular Constraints
  • Near-orthogonality may be used to represent “diversity”, using a regularization approach-angular constraints (ACs) where the angle between components is constrained to be close to π/2 which hence encourages the components to be close to being orthogonal. Analysis shows that the closer to π/2 the angles are, the smaller the estimation error is and the larger the approximation error is. The best tradeoffs of these two errors can be explored by properly tuning the angles. An algorithm based on the alternating direction method of multipliers (ADMM) solves the angle-constrained problems.
  • Angular constraints (ACs) use near-orthogonality to characterize “diversity” and encourage the angles between component vectors to be close to π/2. The ACs are defined as requiring the absolute value of the cosine similarity between each pair of components to be less equal to a small value τ, which leads to the following angle-constrained problem:
  • min W ( ) ( Eq . 10 ) s . t . 1 i < j m , w i · w j w i 2 w j 2 τ ,
  • where
    Figure US20200027567A1-20200123-P00012
    ={wi}i=1 m denotes the component vectors and
    Figure US20200027567A1-20200123-P00013
    (
    Figure US20200027567A1-20200123-P00012
    ) is the objective function of this problem. The parameter τ controls the level of near-orthogonality (or diversity). A smaller τ indicates that the vectors are closer to being orthogonal, and hence are more diverse. As will be shown later, representing diversity using the angular constraints facilitates theoretical analysis and is empirically effective as well.
  • Adversarial Reconciliation Module
  • The writing styles of diagnostic descriptions (DDs) and code descriptions (CDs) are largely different, which makes the matching between a DD and a CD error-prone. To address this issue, an adversarial learning approach reconciles the writing styles. On top of the latent representation DD vectors 216, a discriminative network is built to distinguish which inputs are DDs and which are CDs. The diagnostic description encoding module 202 and the ICD code description encoding module 204 try to make such a discrimination impossible. By doing this, the learned representations are independent of the writing styles and facilitate more accurate matching.
  • To this end, an adversarial learning approach is used to reconcile the different writing styles of diagnosis descriptions and code descriptions. The basic idea is: after encoded, if a description cannot be discerned to be a DD or a CD, then the difference in their writing styles is eliminated. A discriminative network included in the adversarial reconciliation module 206 takes the encoding vector of a diagnosis description as input and tries to identify it as a DD or CD. The diagnostic description encoding module 202 and the ICD code description encoding module 204 adjust their weight parameters so that such a discrimination is difficult to be achieved by the discriminative network.
  • Consider all the diagnosis descriptions {tr, yr}r=1 R where tr is a description and yr is a binary label. yr=1 if tr is a DD and yr=0 if otherwise. Let f (tr; Ws) denote the sequential LSTM (SLSTM) encoder parameterized by Ws. This SLSTM encoder is shared by the diagnostic description encoding module 202 and the ICD code description encoding module 204. Note that for CDs, a TLSTM is further applied on top of the encodings produced by the SLSTM. The SLSTM encoding vectors of CDs are used as the input of the discriminative network rather than using the TLSTM encodings since the latter are irrelevant to writing styles. Let g(f(tr; Ws); Wd) denote the discriminative network parameterized by Wd. It takes the encoding vector f (tr; Ws) as input and produces the probability that tr is a DD. Adversarial learning is performed by solving this problem:
  • max W s min W d adv = r = 1 R CE ( g ( f ( t r ; W s ) ; W d ) , y r ) ( Eq . 11 )
  • The discriminative network tries to differentiate DDs from CDs by minimizing this classification loss while the encoder maximizes this loss so that DDs and CDs are not distinguishable.
  • Attentional Matching Module
  • The attentional matching module 208 is configured to map diagnostic descriptions to ICD codes. The encoded DD vectors 216 and the encoded ICD vectors 218 are fed into the attentional matching module 208 to perform code assignments. The attentional matching module 208 allows multiple diagnostic descriptions to be matched to a single code and allows a single diagnostic description to be matched to multiple codes. An order of importance among codes is incorporated by the isotonic constraints module 210. These constraints regulate the weight parameters of the model so that codes with higher importance are given larger prediction scores.
  • Typically, the number of written diagnosis descriptions does not equal to the number of assigned ICD codes. Accordingly, the attentional matching module 208 disclosed herein is configured to take all diagnosis descriptions into account during coding by adopting an attention strategy. The attentional matching module 208 provides a recipe for choosing which diagnosis descriptions are important when performing coding. For the i-th ICD code, an importance score or attention score ai,j on the j-th diagnosis description is calculated as ui Thj. The attentional matching module 208 may utilize these attention scores based on a hard selection mechanism or a soft attention mechanism.
  • The hard selection mechanism is based on the assumption that the most related diagnosis description plays a decisive role when assigning ICD codes. In this mechanism, for each ICD, the dominating diagnosis is defined as the one that has the maximum attention score among all diagnosis descriptions. The probability of the i-th ICD code being assigned is thus: pi=sigmoid(maxij=1, . . . ,maij).
  • A soft-attention mechanism may be used to calculate an attention score or importance score between a diagnostic description and a plurality of ICD codes. An example of such a mechanism is described in Dzmitry Bandanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2014, the disclosure of which is herein incorporated by reference.
  • Instead of choosing the single maximum attention score, as is done in the hard selection mechanism, the soft-attention mechanism applies a softmax function to normalize the attention scores among all diagnosis descriptions into a probability simplex. The normalized attention scores are utilized as the weights of different diagnosis descriptions. The weighted average over the hidden representations of different diagnosis descriptions is used as the attentional hidden vector. In this way, the attentional hidden vector can take into account all diagnosis descriptions with varying levels of attention.
  • In the soft-attention mechanism, the hidden representations of diagnostic descriptions and codes is denoted as {hm}m=1 M and {un}n=1 N respectively, where M is the number of diagnostic descriptions of one patient and N is the total number of codes in the dataset. The mapping from diagnostic descriptions to codes is not one-to-one. In many cases, a code is assigned only when a certain combination of K (1<K≤M) diseases simultaneously appear within the M diagnostic descriptions and the value of K depends on this code. Among the K diseases, their importance of determining the assignment of this code is different. For the rest M-K diagnostic descriptions, their importance is considered to be zero.
  • For a code un, the importance of a diagnostic description hm to un is calculated as ãnm=un Thm. The scores {ãnm}m=1 M of all diagnostic descriptions are normalized into a probabilistic simplex using the softmax operation: ãnm=exp(anm)/Σl=1 M exp(anl) Given these normalized importance scores {ãnm}m=1 M, the scores are used to weight the representations of diagnostic descriptions and obtain a single attentional vector of the M diagnostic descriptions: ĥnm=1 Mãnmhm. Vectors ĥn and un are concatenated and a linear classifier is used to predict the probability that code n should be assigned: pn=sigmoid(wn Tn; un]+bn), where the coefficients wn and bias bn are specific to code n.
  • The weight parameters Θ of the model are trained using the data of L patient visits. Θ includes the SLSTM weights Ws, TLSTM weights Wt, and weights Wp in the final prediction layer. Let cl
    Figure US20200027567A1-20200123-P00001
    N be a binary vector where cn l=1 if the n-th code is assigned to this patient and cn l=0 if otherwise. Θ can be learned by minimizing the following prediction loss:

  • minΘ
    Figure US20200027567A1-20200123-P00003
    (Θ)=Σl=1 LΣn=1 N CE(p n (l) ,c n (l)),  (Eq. 12)
  • where pn (l) is the predicted probability that code n is assigned to patient visit l and pn (l) is a function of Θ. CE(⋅, ⋅) is the cross-entropy loss.
  • Isotonic Constraints Module
  • Next, the importance order among ICD codes is incorporated. For the D(l) codes assigned to patient l, without loss of generality, the order is assumed 1
    Figure US20200027567A1-20200123-P00002
    2 . . .
    Figure US20200027567A1-20200123-P00002
    D(l) (the order is given by human coders as ground truth in the MIMIC-III dataset). The predicted probability pi(1≤i≤D(l)) is used to characterize the importance of code i. To incorporate the order, an isotonic constraint is imposed on the probabilities p1 (l)
    Figure US20200027567A1-20200123-P00002
    p2 (l) . . .
    Figure US20200027567A1-20200123-P00002
    pD (l) (l): and the following problem is solved:

  • minΘ
    Figure US20200027567A1-20200123-P00003
    pred(Θ)+maxw d (−λ
    Figure US20200027567A1-20200123-P00003
    adv(W s ,W d))

  • s.t.p 1 (l)
    Figure US20200027567A1-20200123-P00002
    p 2 (l) . . .
    Figure US20200027567A1-20200123-P00003
    p D (l) (l)

  • l=1, . . . ,L  (Eq. 13)
  • where the probabilities pi (l) are functions of Θ and λ is a tradeoff parameter.
  • An algorithm based on the alternating direction method of multiplier (ADMM) [52] is developed to solve the problem defined in Eq. 13. Let p(l) be a |D(l)|-dimensional vector where the i-th element is pi (l). The problem is written into an equivalent form

  • minΘ
    Figure US20200027567A1-20200123-P00003
    pred(Θ)+maxw d (−λ
    Figure US20200027567A1-20200123-P00003
    adv(W s ,W d))

  • s.t.p 1 (l)
    Figure US20200027567A1-20200123-P00003
    p 2 (l)

  • q 1 (l)
    Figure US20200027567A1-20200123-P00003
    q 2 (l) . . .
    Figure US20200027567A1-20200123-P00003
    q |D (l) | (l)

  • l=1, . . . ,L  (Eq. 14)
  • Then the augmented Lagrangian is written
  • min Θ , q , v pred ( Θ ) + max W d ( - λ adv ( W s , W d ) ) + p ( l ) - q ( l ) , v ( l ) + ρ 2 p ( l ) - q ( l ) 2 2 ) ( Eq . 15 ) s . t . q 1 ( l ) q 2 ( l ) q D ( l ) ( l ) l = 1 , , L
  • This problem is solved by alternating between {p(l)}l=1 L, {q(l)}l=1 L and {v(l)}l=1 L. The subproblem defined over q(l) is
  • min q ( l ) - q ( l ) , v ( l ) + ρ 2 p ( l ) - q ( l ) 2 2 ( Eq . 16 ) s . t . q 1 ( l ) q 2 ( l ) q d ( l ) ( l )
  • which is an isotonic projection problem and can be solved via the algorithm proposed in Yao-Liang Yu and Eric P Xing. Exact algorithms for isotonic regression and related. In Journal of Physics: Conference Series, volume 699, page 012016. IOP Publishing, 2016. With {q(l)}l=1 L and {v(l)}l=1 L fixed, the sub-problem is minΘ
    Figure US20200027567A1-20200123-P00003
    pred(Θ)+maxw d (−λ
    Figure US20200027567A1-20200123-P00003
    adv(Ws, Wd)) which can be solved using stochastic gradient descent (SGD). The update of v(l) is simple: v(l)=v(l)+ρ(p(l)−q(l)).
  • FIG. 4 is a block diagram of a computing device 400 that embodies the ICD coding system 100 of FIG. 1. The computing device 400 is specially configured to execute instructions related to the ICD code assignment process described above, including the application of machine-learned algorithms to diagnostic description records. Computers capable of being specially configured to execute such instructions may be in the form of a laptop, desktop, workstation, or other appropriate computers.
  • The computing device 400 includes a central processing unit (CPU) 402, a memory 404, e.g., random access memory, and a computer readable media 406 that stores program instructions that enable the CPU and memory to implement the functions of the diagnostic description encoding module 102 and the ICD code assignment module 104 of the ICD coding system 100 described above with reference to FIG. 1. The computing device 400 also includes a user interface 408 and a display 410, and an interface bus 412 that interconnects all components of the computing device.
  • Computer readable media 406 is suitable for storing ICD code system processing instructions include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, flash memory devices, magnetic disks, magneto optical disks and CD ROM and DVD-ROM disks. In operation, the CPU 402 and memory 404 executes the ICD coding system processing instructions stored in the computer readable media 406 to thereby perform the functions of the diagnostic description encoding module 102 and the ICD code assignment module 104.
  • The user interface 408, which may be a keyboard or a mouse, and the display 410 allow for a clinician to interface with the computing device 400. For example, a clinician seeking to obtain a set of ICD codes for a subject patient, may input a diagnostic description record of a subject patient for processing. The clinician may then initiate execution of the ICD coding system processing instructions stored in the computer readable media 406 through the user interface 408, and await a display of the predicted medications.
  • FIG. 5 is a schematic block diagram of an apparatus 500. The apparatus 500 may correspond to one or more processors configured to develop and train the machine-learned algorithm included in the ICD coding system of FIG. 1. The apparatus 500 may be embodied in any number of processor-driven devices, including, but not limited to, a server computer, a personal computer, one or more networked computing devices, an application-specific circuit, a minicomputer, a microcontroller, and/or any other processor-based device and/or combination of devices.
  • The apparatus 500 may include one or more processing units 502 configured to access and execute computer-executable instructions stored in at least one memory 504. The processing unit 502 may be implemented as appropriate in hardware, software, firmware, or combinations thereof. Software or firmware implementations of the processing unit 502 may include computer-executable or machine-executable instructions written in any suitable programming language to perform the various functions described herein. The processing unit 502 may include, without limitation, a central processing unit (CPU), a digital signal processor (DSP), a reduced instruction set computer (RISC) processor, a complex instruction set computer (CISC) processor, a microprocessor, a microcontroller, a field programmable gate array (FPGA), a System-on-a-Chip (SOC), or any combination thereof. The apparatus 500 may also include a chipset (not shown) for controlling communications between the processing unit 502 and one or more of the other components of the apparatus 500. The processing unit 502 may also include one or more application-specific integrated circuits (ASICs) or application-specific standard products (ASSPs) for handling specific data processing functions or tasks.
  • The memory 504 may include, but is not limited to, random access memory (RAM), flash RAM, magnetic media storage, optical media storage, and so forth. The memory 504 may include volatile memory configured to store information when supplied with power and/or non-volatile memory configured to store information even when not supplied with power. The memory 504 may store various program modules, application programs, and so forth that may include computer-executable instructions that upon execution by the processing unit 502 may cause various operations to be performed. The memory 504 may further store a variety of data manipulated and/or generated during execution of computer-executable instructions by the processing unit 502.
  • The apparatus 500 may further include one or more interfaces 506 that may facilitate communication between the apparatus and one or more other apparatuses. For example, the interface 506 may be configured to receive records of diagnostic descriptions and records of ICD code descriptions. Communication may be implemented using any suitable communications standard. For example, a LAN interface may implement protocols and/or algorithms that comply with various communication standards of the Institute of Electrical and Electronics Engineers (IEEE), such as IEEE 802.11, while a cellular network interface implement protocols and/or algorithms that comply with various communication standards of the Third Generation Partnership Project (3GPP) and 3GPP2, such as 3G and 4G (Long Term Evolution), and of the Next Generation Mobile Networks (NGMN) Alliance, such as 5G.
  • The memory 504 may store various program modules, application programs, and so forth that may include computer-executable instructions that upon execution by the processing unit 502 may cause various operations to be performed. For example, the memory 504 may include an operating system module (O/S) 508 that may be configured to manage hardware resources such as the interface 506 and provide various services to applications executing on the apparatus 500.
  • The memory 504 stores additional program modules such as: (1) a DD encoding module that receives diagnostic descriptions and generates latent representations of the diagnostic descriptions in the form of an encoded DD vectors; (2) an ICD encoding module 512 that receives ICD codes and generates latent representations of the codes in the form of an encoded ICD vectors; (3) an adversarial reconciliation module 514 that reconciles the different writing styles of diagnostic descriptions and ICD code descriptions; (4) an attention matching module that maps diagnostic descriptions to ICD codes; and (5) an isotonic constraints module 518 that establishes an order of importance for ICD codes. Each of these modules includes computer-executable instructions that when executed by the processing unit 502 cause various operations to be performed, such as the operations described above.
  • The apparatus 500 and modules disclosed herein may be implemented in hardware or software that is executed on a hardware platform. The hardware or hardware platform may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic component, discrete gate or transistor logic, discrete hardware components, or any combination thereof, or any other suitable component designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing components, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, or any other such configuration.
  • Evaluation
  • We performed the study on the publicly available MIMIC-III dataset, which contains de-identified electronic health records (EHRs) of 58,976 patient visits in the Beth Israel Deaconess Medical Center from 2001 to 2012. Each EHR has a clinical note called discharge summary, which contains multiple sections of information, such as ‘discharge diagnosis’, ‘past medical history’, etc. From the ‘discharge diagnosis’ and ‘final diagnosis’ sections, we extract the diagnosis descriptions (DDs) written by physicians. Each DD is a short phrase or a sentence, articulating a certain disease or condition. Medical coders perform ICD coding mainly based on DDs. Following such a practice, in this paper, we set the inputs of the automated coding model to be the DDs while acknowledging that other information in the EHRs is also valuable and is referred to by coders for code assignment. For simplicity, we leave the incorporation of non-DD information to future study.
  • Each patient visit is assigned with a list of ICD codes, ranked in descending order of importance and relevance. For each visit, the number of codes is usually not equal to the number of diagnosis descriptions. These ground truth codes serve as the labels to train our coding model. The entire dataset contains 6,984 unique codes, each of which has a textual description, de-scribing a disease, symptom, or condition. The codes are organized into a hierarchy where the top-level codes correspond to general diseases while the bottom-level ones represent specific dis-eases. In the code tree, children of a node represent subtypes of a disease. Table 1 below shows the diagnosis descriptions of a patient visit and the assigned ICD codes. Inside the parentheses are the descriptions of the codes. The codes are ranked according to descending importance.
  • TABLE 1
    Diagnosis Descriptions
    1. Prematurity at 35 4/7 weeks gestation
    2. Twin number two of twin gestation
    3. Respiratory distress secondary to transient
     tachypnea of the newborn
    4. Suspicion for sepsis
    ruled out Assigned ICD
    Codes
    1. V31.00 (Twin birth, mate liveborn, born in hospital,
        delivered without mention of cesarean section)
    2. 765.18 (Other preterm infants, 2,000-2,499 grams)
    3. 775.6 (Neonatal hypoglycemia)
    4. 770.6 (Transitory tachypnea of newborn)
    5. V29.0 (Observation for suspected infectious condition)
    6. V05.3 (Need for prophylactic vaccination and
        inoculation against viral hepatitis)
  • Experimental Settings: Out of the 6,984 unique codes, we selected 2,833 codes that have the top frequencies to perform the study. We split the data into a train/validation/test dataset with 40 k/7 k/12 k patient visits respectively. The hyperparameters were tuned on the validation set. The SLSTMs are bidirectional and dropout with 0.5 probability was used. The size of hidden states in all LSTMs was set to 100. The word embeddings were trained on the fly and their dimension was set to 200. The tradeoff parameter λ was set to 0.1. The parameter ρ in the ADMM algorithm was set to_1. In the SGD algorithm for solving minΘ
    Figure US20200027567A1-20200123-P00003
    pred(Θ)+maxw d (−λ
    Figure US20200027567A1-20200123-P00003
    adv(Ws, Wd)), we used the Adam optimizer with an initial learning rate 0.001 and a mini-batch size 20. Sensitivity (true positive rate) and specificity (true negative rate) were used to evaluate the code assignment performance. We calculated these two scores for each individual code on the test set, then took a weighted (proportional to codes'frequencies) average across all codes. To evaluate the ranking performance of codes, we used the normalized discounted cumulative gain (NDCG).
  • Ablation Study: We performed ablation study to verify the effectiveness of each module in our model. To evaluate module X, we remove it from the model without changing other modules and denote such a baseline by No-X. The comparisons of No-X with the full model are given below in the Table 2, which shows weighted sensitivity and specificity on the test set.
  • TABLE 2
    Sensitivity Specificity
    Larkey and Croft 0.15 0.17
    Franz et al. 0.19 0.21
    Pestian et al. 0.12 0.21
    Kavuluru et al. 0.09 0.11
    Kavuluru et al. 0.21 0.25
    Koopman et al. 0.18 0.20
    LET 0.23 0.29
    HierNet 0.26 0.30
    HybridNet 0.25 0.31
    BranchNet 0.25 0.29
    No-TLSTM 0.23 0.28
    Bottom-up TLSTM 0.27 0.31
    No-AL 0.26 0.31
    No-IC 0.24 0.29
    No-AM 0.27 0.29
    Our full model 0.29 0.33
  • On the first panel are baselines for holistic comparison. On the second panel are baselines compared in the ablation study of tree-of-sequences LSTM for capturing hierarchical relationship. On the third panel are baselines compared in the ablation study of adversarial learning for writing-style reconciliation, isotonic constraints for ranking, and attentional matching.
  • Tree-of-sequences LSTM: To evaluate this module, we compared with the two configurations: (1) No-TLSTM, which removes the tree LSTM and directly uses the hidden states produced by the sequential LSTM as the final representations of codes; (2) Bottom-up TLSTM, which re-moves the hidden states generated by the top-down TLSTM. In addition, we compared with four hierarchical classification baselines including (1) hierarchical network (HierNet), (2) HybridNet, (3) branch network (BranchNet), (4) label embedding tree (LET), by using them to replace the bidirectional tree LSTM while keeping other modules untouched. Table 2 shows the average sensitivity and specificity scores achieved by these methods on the test set. We make the following observations. First, removing tree LSTM largely degrades performance: the sensitivity and specificity of No-TLSTM is 0.23 and 0.28 respectively while our full model (which uses bidirectional TLSTM) achieves 0.29 and 0.33 respectively. The reason is No-TLSTM ignores the hierarchical relationship among codes. Second, the bottom-up tree LSTM alone performs less well than the bidirectional tree LSTM. This demonstrates the necessity of the top-down TLSTM, which ensures every two codes are connected by directed paths and can more expressively capture code-relations in the hierarchy. Third, our method outperforms the four baselines. The possible reason is our method directly builds codes' hierarchical relation-ship into their representations while the baselines learn representations and capture hierarchical relationships separately.
  • Next, we present some qualitative results. For a patient (admission ID 147798) having aDD ‘E Coli urinary tract infection’, without using tree LSTM, two sibling codes 585.2 (chronic kidney disease, stage II (mild))—which is the ground truth—and 585.4 (chronic kidney disease, stage IV (severe)) are simultaneously assigned possibly because their textual descriptions are very similar (only differ in the level of severity). This is incorrect because 585.2 and 585.4 are children of 585 (chronic kidney disease) and the severity level of this disease cannot simultaneously be mild and severe. After the tree LSTM is added, the false prediction of 585.4 is eliminated, which demonstrates the effectiveness of tree LSTM in incorporating one constraint induced by the code hierarchy: among the nodes sharing the same parent, only one should be selected.
  • For patient 197205, No-TLSTM assigns the following codes: 462 (subacute sclerosing panencephalitis), 790.29 (other abnormal glucose), 799.9 (unspecified viral infection), and 285.21 (anemia in chronic kidney disease). Among these codes, the first three are the ground truth and the fourth one is incorrect (the ground truth is 401.9 (unspecified essential hypertension)). Adding tree LSTM fixes this error. The average distance between 401.9 and the rest of ground truth codes is 6.2. For the incorrectly assigned code 285.21, such a distance is 7.9. This demonstrates that tree LSTM is able to capture another constraint imposed by the hierarchy: codes with smaller tree-distance are more likely to be assigned together.
  • Adversarial learning: To evaluate the efficacy of adversarial learning (AL), we remove it from the full model and refer to this baseline as No-AL. Specifically, in Eq. 13, the loss term maxw d (−
    Figure US20200027567A1-20200123-P00003
    advWs, We)) is taken away. Table 2 shows the results, from which we observe that after AL is removed, the sensitivity and specificity are dropped from 0.29 and 0.33 to 0.26 and 0.31 respectively. No-AL does not reconcile different writing styles of diagnosis descriptions (DDs) and code descriptions (CDs). As a result, a DD and a CD that have similar semantics may be mismatched because their writing styles are different. For example, a patient (admission ID 147583) has a DD ‘h/o DVT on anticoagulation’, which contains abbreviation DVT (deep vein thrombosis). Due to the presence of this abbreviation, it is difficult to assign a proper code to this DD since the textual descriptions of codes do not contain abbreviations. With adversarial learning, our model can correctly map this DD to a ground truth code: 443.9 (peripheral vascular disease, unspecified). Without AL, this code is not selected. As another example, a DD ‘coronary artery disease, STEMI, s/p 2 stents placed in RCA’ was given to patient 148532. This DD is written informally and ungrammatically, and contains too much detailed information, e.g., ‘s/p 2 stents placed in RCA’. Such a writing style is quite different from that of CDs. With AL, our model successfully matches this DD to a ground truth code: 414.01 (coronary atherosclerosis of native coronary artery). On the contrary, No-AL fails to achieve this.
  • Isotonic constraint (IC): To evaluate this ingredient, we remove the ICs from Eq. 13 during training and denote this baseline as No-IC. We used NDCG to measure the ranking performance, which is calculated in the following way. Consider a testing patient-visit l where the ground truth ICD codes are
    Figure US20200027567A1-20200123-P00014
    (l). For any code c, we define the relevance score of c to l as 0 if c∉
    Figure US20200027567A1-20200123-P00014
    (l). and as |
    Figure US20200027567A1-20200123-P00014
    (l)|−r(c) if otherwise, where r(c) is the ground truth rank of c in
    Figure US20200027567A1-20200123-P00014
    (l). We rank codes in descending order of their corresponding prediction probabilities and obtain the predicted rank for each code. We calculated the NDCG scores at position 2, 4, 6, 8 based on the relevance scores and predicted ranks, which are shown in Table 3:
  • TABLE 3
    Position
    2 4 6 8
    No-IC 0.27 0.26 0.23 0.20
    IC 0.32 0.29 0.27 0.23
  • As can be seen, using IC achieves much higher NDCG than No-IC, which demonstrates the effectiveness of IC in capturing the importance order among codes.
  • We also evaluated how IC affects the sensitivity and specificity of code assignment. As can be seen from Table 2, No-IC degrades the two scores from 0.29 and 0.33 to 0.24 and 0.29 respectively, which indicates that IC is helpful in training a model that can more correctly assign codes. This is because IC encourages codes that are highly relevant to the patients to be ranked at top positions, which prevents the selection of irrelevant codes.
  • Attentional matching (AM): In the evaluation of this module, we compared with a baseline—No-AM, which performs an unweighted average of the M DDs:
  • h ^ n = 1 M m = 1 M h m ,
  • catenates ĥn with un, and feeds the concatenated vector into the final prediction layer. From Table 2, we can see our full model (with AM) outperforms No-AM, which demonstrates the effectiveness of attentional matching. In determining whether a code should be assigned, different DDs have different importance weights. No-AM ignores such weights, therefore performing less well.
  • AM can correctly perform the many-to-one mapping from multiple DDs to a CD. For example patient 190236 was given two DDs: ‘renal insufficiency’ and ‘acute renal failure’. AM maps them to a combined ICD code: 403.91 (hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease), which is in the ground truth provided by medical coders. On the contrary, No-AM fails to assign this code. On the other hand, AM is able to correctly map a DD to multiple CDs. For example, a DD ‘congestive heart failure, diastolic’ was given to patient 140851. AM successfully maps this DD to two codes: (1) 428.0 (congestive heart failure, unspecified); (2) 428.30 (diastolic heart failure, unspecified). Without AM, this DD is mapped only to 428.0
  • Holistic comparison with other baselines: In addition to evaluating the four modules individually, we also compared our full model with four other baselines proposed by for ICD coding. Table 2 shows the results. As can be seen, our approach achieves much better sensitivity and specificity scores. The reason that our model works better is two-fold. First, our model is based on deep neural networks, which has arguably better modeling power than linear methods used in the baselines. Second, our model is able to capture the hierarchical relationship and importance order among codes, can alleviate the discrepancy in writing styles, and allows flexible many-to-one and one-to-many mappings from DDs to CDs. These merits are not possessed by the baselines
  • Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. The software may reside on a computer-readable medium. A computer-readable medium may include, by way of example, a magnetic storage device (e.g., hard disk, floppy disk, magnetic strip), an optical disk (e.g., compact disk (CD), digital versatile disk (DVD)), a smart card, a flash memory device (e.g., card, stick, key drive), random access memory (RAM), read only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), a general register, or any other suitable non-transitory medium for storing software.
  • While various embodiments have been described above, they have been presented by way of example only, and not by way of limitation. Likewise, the various diagrams may depict an example architectural or other configuration for the disclosure, which is done to aid in understanding the features and functionality that can be included in the disclosure. The disclosure is not restricted to the illustrated example architectures or configurations, but can be implemented using a variety of alternative architectures and configurations.
  • In this document, the terms “module” and “engine” as used herein, refers to software, firmware, hardware, and any combination of these elements for performing the associated functions described herein. Additionally, for purpose of discussion, the various modules are described as discrete modules; however, as would be apparent to one of ordinary skill in the art, two or more modules may be combined to form a single module that performs the associated functions according embodiments of the invention.
  • In this document, the terms “computer program product”, “computer-readable medium”, and the like, may be used generally to refer to media such as, memory storage devices, or storage unit. These, and other forms of computer-readable media, may be involved in storing one or more instructions for use by processor to cause the processor to perform specified operations. Such instructions, generally referred to as “computer program code” (which may be grouped in the form of computer programs or other groupings), when executed, enable the computing system.
  • Terms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. As examples of the foregoing: the term “including” should be read as meaning “including, without limitation” or the like; the term “example” is used to provide exemplary instances of the item in discussion, not an exhaustive or limiting list thereof; and adjectives such as “conventional,” “traditional,” “normal,” “standard,” “known”, and terms of similar meaning, should not be construed as limiting the item described to a given time period, or to an item available as of a given time. But instead these terms should be read to encompass conventional, traditional, normal, or standard technologies that may be available, known now, or at any time in the future.
  • Additionally, memory or other storage, as well as communication components, may be employed in embodiments of the invention. It will be appreciated that, for clarity purposes, the above description has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units, processing logic elements or domains may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processing logic elements or controllers may be performed by the same processing logic element or controller. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality, rather than indicative of a strict logical or physical structure or organization.
  • Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by, for example, a single unit or processing logic element. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined. The inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also, the inclusion of a feature in one category of claims does not imply a limitation to this category, but rather the feature may be equally applicable to other claim categories, as appropriate.
  • The various aspects of this disclosure are provided to enable one of ordinary skill in the art to practice the present invention. Various modifications to exemplary embodiments presented throughout this disclosure will be readily apparent to those skilled in the art. Thus, the claims are not intended to be limited to the various aspects of this disclosure, but are to be accorded the full scope consistent with the language of the claims. All structural and functional equivalents to the various components of the exemplary embodiments described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. No claim element is to be construed under the provisions of 35 U.S.C. § 112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.”

Claims (20)

What is claimed is:
1. A method of assigning a set of international classification of diseases (ICD) codes to a patient, the method comprising:
obtaining a diagnostic description vector from at least one diagnostic description record of the patient; and
applying a machine-learned ICD code assignment algorithm to the diagnostic description vector to assign a set of ICD codes to the patient.
2. The method of claim 1, wherein obtaining a diagnostic description vector comprises processing the at least one diagnostic description record with a long short-term memory (LSTM) recurrent neural network.
3. The method of claim 2, wherein the diagnostic description vector comprises a plurality of hidden representations, each corresponding to a diagnostic description in the at least one diagnostic description record, and processing comprises obtaining the plurality of hidden representations using each of a character-level LSTM and a word-level LSTM.
4. The method of claim 2, wherein the LSTM is a sequential LSTM.
5. The method of claim 1, wherein applying a machine-learned ICD code assignment algorithm to the diagnostic description vector comprises:
for each of the diagnostic descriptions included in the diagnostic description vector, selecting one or more ICD codes to assign to the patient based on a mapping function maintained in the machine-learned ICD code assignment algorithm that maps diagnostic descriptions to one or more ICD codes.
6. The method of claim 5, wherein the mapping function maintained in the machine-learned ICD code assignment algorithm maps a diagnostic description to one or more ICD codes based on importance scores between the diagnostic description and a plurality of ICD codes.
7. The method of claim 6, wherein the importance score of a diagnostic description to an ICD code is normalized across the plurality of ICD codes.
8. The method of claim 6, wherein the plurality of ICD codes are included in an ICD vector obtained by processing at least one ICD code description record with a long short-term memory (LSTM) recurrent neural network.
9. The method of claim 6, wherein the LSTM recurrent neural network is a tree-of-sequences LSTM.
10. The method of claim 1, wherein the set of ICD codes assigned to the patient comprises a plurality of ICD codes, and further comprising applying an isotonic constraints algorithm to the plurality of ICD codes to obtain an order of importance among the plurality of ICD codes.
11. The method of claim 10, wherein the isotonic constraints algorithm is based on an alternating direction method of multiplier.
12. A system for assigning a set of international classification of diseases (ICD) codes to a patient, the system comprising:
a diagnostic description encoding module configured to obtain a diagnostic description vector from at least one diagnostic description record of the patient; and
an ICD code assignment module configured to apply a machine-learned ICD code assignment algorithm to the diagnostic description vector to assign a set of ICD codes to the patient.
13. The system of claim 12, wherein the diagnostic description encoding module obtains a diagnostic description vector be being configured to process the at least one diagnostic description record with a long short-term memory (LSTM) recurrent neural network.
14. The system of claim 12, wherein the ICD code assignment module is configured to, for each of the diagnostic descriptions included in the diagnostic description vector, select one or more ICD codes to assign to the patient based on a mapping function maintained in the machine-learned ICD code assignment algorithm that maps diagnostic descriptions to one or more ICD codes.
15. A machine learning apparatus for generating a map between diagnostic descriptions and international classification of diseases (ICD) codes, the apparatus comprising:
a processor; and
a memory coupled to the processor,
wherein the processor is configured to:
generate representations of diagnostic descriptions in a form of diagnostic descriptions vectors;
generate representations of ICD codes in a form of ICD vectors;
process the diagnostic descriptions vectors and the ICD vectors to obtain an importance score between each diagnostic description represented in a diagnostic description vector and each ICD represented in an ICD vector, and
associate each diagnostic description represented in the diagnostic description vector with one or more ICDs represented in the ICD vector based on the importance scores.
16. The machine learning apparatus of claim 15, wherein the processor generates representations of diagnostic descriptions in the form of diagnostic descriptions vectors by processing at least one diagnostic description record with a long short-term memory (LSTM) recurrent neural network.
17. The machine learning apparatus of claim 16, wherein the LSTM is a sequential LSTM.
18. The machine learning apparatus of claim 15, wherein the processor generates representations of ICD codes in the form of ICD vectors by processing at least one ICD code record with a long short-term memory (LSTM) recurrent neural network.
19. The machine learning apparatus of claim 18, wherein the LSTM is a tree-of-sequences LSTM.
20. The machine learning apparatus of claim 15, wherein the processor is further configured to establish an order of importance for ICD codes in instances where a plurality of ICD codes are associated with a diagnostic description.
US16/207,119 2018-07-17 2018-12-01 Systems and Methods for Automatically Generating International Classification of Diseases Codes for a Patient Based on Machine Learning Abandoned US20200027567A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/207,119 US20200027567A1 (en) 2018-07-17 2018-12-01 Systems and Methods for Automatically Generating International Classification of Diseases Codes for a Patient Based on Machine Learning

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201862699385P 2018-07-17 2018-07-17
US201862756024P 2018-11-05 2018-11-05
US16/207,119 US20200027567A1 (en) 2018-07-17 2018-12-01 Systems and Methods for Automatically Generating International Classification of Diseases Codes for a Patient Based on Machine Learning

Publications (1)

Publication Number Publication Date
US20200027567A1 true US20200027567A1 (en) 2020-01-23

Family

ID=69162064

Family Applications (6)

Application Number Title Priority Date Filing Date
US16/207,115 Abandoned US20200027556A1 (en) 2018-07-17 2018-12-01 Systems and Methods for Medical Topic Discovery Based on Large-Scale Machine Learning
US16/207,119 Abandoned US20200027567A1 (en) 2018-07-17 2018-12-01 Systems and Methods for Automatically Generating International Classification of Diseases Codes for a Patient Based on Machine Learning
US16/207,117 Active 2039-07-22 US11087864B2 (en) 2018-07-17 2018-12-01 Systems and methods for automatically tagging concepts to, and generating text reports for, medical images based on machine learning
US16/207,114 Active 2039-04-12 US11101029B2 (en) 2018-07-17 2018-12-01 Systems and methods for predicting medications to prescribe to a patient based on machine learning
US17/366,992 Abandoned US20210335469A1 (en) 2018-07-17 2021-07-02 Systems and Methods for Automatically Tagging Concepts to, and Generating Text Reports for, Medical Images Based On Machine Learning
US17/386,792 Abandoned US20210358588A1 (en) 2018-07-17 2021-07-28 Systems and Methods for Predicting Medications to Prescribe to a Patient Based on Machine Learning

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US16/207,115 Abandoned US20200027556A1 (en) 2018-07-17 2018-12-01 Systems and Methods for Medical Topic Discovery Based on Large-Scale Machine Learning

Family Applications After (4)

Application Number Title Priority Date Filing Date
US16/207,117 Active 2039-07-22 US11087864B2 (en) 2018-07-17 2018-12-01 Systems and methods for automatically tagging concepts to, and generating text reports for, medical images based on machine learning
US16/207,114 Active 2039-04-12 US11101029B2 (en) 2018-07-17 2018-12-01 Systems and methods for predicting medications to prescribe to a patient based on machine learning
US17/366,992 Abandoned US20210335469A1 (en) 2018-07-17 2021-07-02 Systems and Methods for Automatically Tagging Concepts to, and Generating Text Reports for, Medical Images Based On Machine Learning
US17/386,792 Abandoned US20210358588A1 (en) 2018-07-17 2021-07-28 Systems and Methods for Predicting Medications to Prescribe to a Patient Based on Machine Learning

Country Status (1)

Country Link
US (6) US20200027556A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10770180B1 (en) * 2018-12-12 2020-09-08 Google Llc Processing clinical notes using recurrent neural networks
US20200312431A1 (en) * 2019-03-29 2020-10-01 Boe Technology Group Co., Ltd. Method, system, and apparatus for automatically adding icd code, and medium
CN112183026A (en) * 2020-11-27 2021-01-05 北京惠及智医科技有限公司 ICD (interface control document) encoding method and device, electronic device and storage medium
US20210011904A1 (en) * 2019-07-11 2021-01-14 Optum, Inc. Label-based information deficiency processing
US11163539B2 (en) * 2020-01-07 2021-11-02 International Business Machines Corporation Virtual detection and technical issue modification
US20210406687A1 (en) * 2019-05-09 2021-12-30 Tencent Technology (Shenzhen) Company Limited Method for predicting attribute of target object based on machine learning and related device
US11257592B2 (en) * 2019-02-26 2022-02-22 International Business Machines Corporation Architecture for machine learning model to leverage hierarchical semantics between medical concepts in dictionaries
US20220236964A1 (en) * 2021-01-28 2022-07-28 Fujitsu Limited Semantic code search based on augmented programming language corpus
WO2022162167A1 (en) * 2021-01-29 2022-08-04 Koninklijke Philips N.V. Disentangled feature representation for analyzing content and style of radiology reports
US11430044B1 (en) * 2019-03-15 2022-08-30 Amazon Technologies, Inc. Identifying items using cascading algorithms
US11562233B2 (en) * 2019-01-17 2023-01-24 Fujitsu Limited Learning method, non-transitory computer readable recording medium, and learning device
US11704573B2 (en) * 2019-03-25 2023-07-18 Here Global B.V. Method, apparatus, and computer program product for identifying and compensating content contributors

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200027556A1 (en) * 2018-07-17 2020-01-23 Petuum Inc. Systems and Methods for Medical Topic Discovery Based on Large-Scale Machine Learning
US11151324B2 (en) * 2019-02-03 2021-10-19 International Business Machines Corporation Generating completed responses via primal networks trained with dual networks
US11281867B2 (en) * 2019-02-03 2022-03-22 International Business Machines Corporation Performing multi-objective tasks via primal networks trained with dual networks
US11515022B1 (en) * 2019-02-11 2022-11-29 Express Scripts Strategic Development, Inc. Methods and systems for automatic prescription processing using machine learning algorithm
US11182457B2 (en) * 2019-03-28 2021-11-23 International Business Machines Corporation Matrix-factorization based gradient compression
US11748613B2 (en) * 2019-05-10 2023-09-05 Baidu Usa Llc Systems and methods for large scale semantic indexing with deep level-wise extreme multi-label learning
CN114391162A (en) * 2019-06-07 2022-04-22 徕卡显微系统Cms有限公司 System and method for processing biologically relevant data and microscope
CN111753078B (en) * 2019-07-12 2024-02-02 北京京东尚科信息技术有限公司 Image paragraph description generation method, device, medium and electronic equipment
CN112420167A (en) * 2019-08-20 2021-02-26 阿里巴巴集团控股有限公司 Image report generation method, device and equipment
EP3825796A1 (en) * 2019-11-22 2021-05-26 Siemens Aktiengesellschaft Method and device for ki-based operation of an automation system
US11836751B2 (en) * 2019-12-16 2023-12-05 Nec Corporation Measuring relatedness between prediction tasks in artificial intelligence and continual learning systems
US11354513B2 (en) * 2020-02-06 2022-06-07 Adobe Inc. Automated identification of concept labels for a text fragment
US11416684B2 (en) 2020-02-06 2022-08-16 Adobe Inc. Automated identification of concept labels for a set of documents
US11321620B2 (en) * 2020-03-31 2022-05-03 Panasonic Intellectual Property Management Co., Ltd. Method and system to detect undefined anomalies in processes
US20230237311A1 (en) * 2020-06-29 2023-07-27 Telefonaktiebolaget Lm Ericsson (Publ) Decentralized federated machine-learning by selecting participating worker nodes
CN112461537B (en) * 2020-10-16 2022-06-17 浙江工业大学 Wind power gear box state monitoring method based on long-time and short-time neural network and automatic coding machine
US20220182802A1 (en) * 2020-12-03 2022-06-09 Qualcomm Incorporated Wireless signaling in federated learning for machine learning components
CN112927753A (en) * 2021-02-22 2021-06-08 中南大学 Method for identifying interface hot spot residues of protein and RNA (ribonucleic acid) compound based on transfer learning
CN112908487B (en) * 2021-04-19 2023-09-22 中国医学科学院医学信息研究所 Automatic identification method and system for updated content of clinical guideline
CN113205096B (en) * 2021-04-26 2022-04-15 武汉大学 Attention-based combined image and feature self-adaptive semantic segmentation method
US11694807B2 (en) * 2021-06-17 2023-07-04 Viz.ai Inc. Method and system for computer-aided decision guidance
US11941357B2 (en) 2021-06-23 2024-03-26 Optum Technology, Inc. Machine learning techniques for word-based text similarity determinations
US20230034450A1 (en) * 2021-07-22 2023-02-02 Qualcomm Incorporated Semantically-augmented context representation generation
CN113900377B (en) * 2021-10-19 2023-10-20 国网江苏省电力有限公司盐城供电分公司 Point-to-point iterative learning minimum energy control method for double-rotor pneumatic system
CN117151208B (en) * 2023-08-07 2024-03-22 大连理工大学 Asynchronous federal learning parameter updating method based on self-adaptive learning rate, electronic equipment and storage medium

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050033606A1 (en) 2003-08-06 2005-02-10 Miller Raymond F. Medication order processing and dispensing system
US8825502B2 (en) 2003-09-30 2014-09-02 Epic Systems Corporation System and method for providing patient record synchronization in a healthcare setting
US7548917B2 (en) 2005-05-06 2009-06-16 Nelson Information Systems, Inc. Database and index organization for enhanced document retrieval
US20080085307A1 (en) * 2006-07-29 2008-04-10 Renjit Sundharadas Paired/matched medication
WO2009103156A1 (en) * 2008-02-20 2009-08-27 Mcmaster University Expert system for determining patient treatment response
JP4922239B2 (en) * 2008-06-04 2012-04-25 日本オプネクスト株式会社 Optical transmitter and flexible substrate
JP2010079930A (en) * 2009-12-25 2010-04-08 System Yoshii:Kk Device for checking medicine interaction
US9778912B2 (en) 2011-05-27 2017-10-03 Cassy Holdings Llc Stochastic processing of an information stream by a processing architecture generated by operation of non-deterministic data used to select data processing modules
US8533195B2 (en) 2011-06-27 2013-09-10 Microsoft Corporation Regularized latent semantic indexing for topic modeling
WO2013001678A1 (en) 2011-06-30 2013-01-03 パナソニック株式会社 Apparatus for searching for analogous cases and apparatus for preparing relevance database, as well as method for searching for analogous cases and method for preparing relevance database
US20160357929A1 (en) 2012-11-21 2016-12-08 Humana Inc. System for drug interaction alerts
US10262107B1 (en) 2013-03-15 2019-04-16 Bao Tran Pharmacogenetic drug interaction management system
US10224119B1 (en) 2013-11-25 2019-03-05 Quire, Inc. (Delaware corporation) System and method of prediction through the use of latent semantic indexing
US9349178B1 (en) * 2014-11-24 2016-05-24 Siemens Aktiengesellschaft Synthetic data-driven hemodynamic determination in medical imaging
US20170154163A1 (en) * 2015-12-01 2017-06-01 Ramot At Tel-Aviv University Ltd. Clinically relevant synthetic lethality based method and system for cancer prognosis and therapy
US20170193197A1 (en) 2015-12-30 2017-07-06 Dhristi Inc. System and method for automatic unstructured data analysis from medical records
US20170228433A1 (en) * 2016-02-04 2017-08-10 Microsoft Technology Licensing, Llc Method and system for diverse set recommendations
WO2017149530A1 (en) * 2016-02-29 2017-09-08 Mor Research Applications Ltd System and method for selecting optimal medications for a specific patient
WO2017202631A1 (en) 2016-05-27 2017-11-30 Koninklijke Philips N.V. Systems and methods for modeling free-text clinical documents into a hierarchical graph-like data structure based on semantic relationships among clinical concepts present in the documents
US20190122397A1 (en) * 2016-07-12 2019-04-25 Mindshare Medical, Inc. Medical analytics system
US10783997B2 (en) 2016-08-26 2020-09-22 International Business Machines Corporation Personalized tolerance prediction of adverse drug events
US10490301B2 (en) 2016-09-06 2019-11-26 International Business Machines Corporation Data-driven prediction of drug combinations that mitigate adverse drug reactions
US10452813B2 (en) * 2016-11-17 2019-10-22 Terarecon, Inc. Medical image identification and interpretation
US20180196924A1 (en) 2017-01-09 2018-07-12 International Business Machines Corporation Computer-implemented method and system for diagnosis of biological conditions of a patient
US10311326B2 (en) * 2017-03-31 2019-06-04 Qualcomm Incorporated Systems and methods for improved image textures
US11404147B2 (en) 2017-05-05 2022-08-02 International Business Machines Corporation Treatment recommendations based on drug-to-drug interactions
NL2019410B1 (en) 2017-08-10 2019-02-21 Aidence B V Computer-aided diagnostics using deep neural networks
US10496884B1 (en) 2017-09-19 2019-12-03 Deepradiology Inc. Transformation of textbook information
WO2019147295A1 (en) 2018-01-29 2019-08-01 Ubiquicorp Limited Proof of majority block consensus method for generating and uploading a block to a blockchain
CN111727478A (en) * 2018-02-16 2020-09-29 谷歌有限责任公司 Automatic extraction of structured labels from medical text using deep convolutional networks and use thereof for training computer vision models
US20210251490A1 (en) * 2018-06-11 2021-08-19 Farida Cheriet System and method for determining coronal artery tissue type based on an oct image and using trained engines
US10740866B2 (en) 2018-06-26 2020-08-11 International Business Machines Corporation Determining appropriate medical image processing pipeline based on machine learning
US20200013487A1 (en) * 2018-07-03 2020-01-09 International Business Machines Corporation Drug Repurposing Hypothesis Generation Using Clinical Drug-Drug Interaction Information
US20200027556A1 (en) * 2018-07-17 2020-01-23 Petuum Inc. Systems and Methods for Medical Topic Discovery Based on Large-Scale Machine Learning

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210125721A1 (en) * 2018-12-12 2021-04-29 Google Llc Processing clinical notes using recurrent neural networks
US11742087B2 (en) * 2018-12-12 2023-08-29 Google Llc Processing clinical notes using recurrent neural networks
US10770180B1 (en) * 2018-12-12 2020-09-08 Google Llc Processing clinical notes using recurrent neural networks
US11562233B2 (en) * 2019-01-17 2023-01-24 Fujitsu Limited Learning method, non-transitory computer readable recording medium, and learning device
US11257592B2 (en) * 2019-02-26 2022-02-22 International Business Machines Corporation Architecture for machine learning model to leverage hierarchical semantics between medical concepts in dictionaries
US11922486B2 (en) 2019-03-15 2024-03-05 Amazon Technologies, Inc. Identifying items using cascading algorithms
US11430044B1 (en) * 2019-03-15 2022-08-30 Amazon Technologies, Inc. Identifying items using cascading algorithms
US11704573B2 (en) * 2019-03-25 2023-07-18 Here Global B.V. Method, apparatus, and computer program product for identifying and compensating content contributors
US20200312431A1 (en) * 2019-03-29 2020-10-01 Boe Technology Group Co., Ltd. Method, system, and apparatus for automatically adding icd code, and medium
US20210406687A1 (en) * 2019-05-09 2021-12-30 Tencent Technology (Shenzhen) Company Limited Method for predicting attribute of target object based on machine learning and related device
US20210011904A1 (en) * 2019-07-11 2021-01-14 Optum, Inc. Label-based information deficiency processing
US11783225B2 (en) * 2019-07-11 2023-10-10 Optum, Inc. Label-based information deficiency processing
US11163539B2 (en) * 2020-01-07 2021-11-02 International Business Machines Corporation Virtual detection and technical issue modification
CN112183026A (en) * 2020-11-27 2021-01-05 北京惠及智医科技有限公司 ICD (interface control document) encoding method and device, electronic device and storage medium
US20220236964A1 (en) * 2021-01-28 2022-07-28 Fujitsu Limited Semantic code search based on augmented programming language corpus
US11609748B2 (en) * 2021-01-28 2023-03-21 Fujitsu Limited Semantic code search based on augmented programming language corpus
WO2022162167A1 (en) * 2021-01-29 2022-08-04 Koninklijke Philips N.V. Disentangled feature representation for analyzing content and style of radiology reports

Also Published As

Publication number Publication date
US20200027539A1 (en) 2020-01-23
US11087864B2 (en) 2021-08-10
US20200027545A1 (en) 2020-01-23
US11101029B2 (en) 2021-08-24
US20200027556A1 (en) 2020-01-23
US20210335469A1 (en) 2021-10-28
US20210358588A1 (en) 2021-11-18

Similar Documents

Publication Publication Date Title
US20200027567A1 (en) Systems and Methods for Automatically Generating International Classification of Diseases Codes for a Patient Based on Machine Learning
US11720639B1 (en) Ontology mapper
Li et al. Neural natural language processing for unstructured data in electronic health records: a review
Wong et al. Using machine learning to identify health outcomes from electronic health record data
Che et al. Interpretable deep models for ICU outcome prediction
US11869664B2 (en) Method and system for assessing drug efficacy using multiple graph kernel fusion
CN113015977A (en) Deep learning based diagnosis and referral of diseases and conditions using natural language processing
US20180067981A1 (en) Automatic Detection and Cleansing of Erroneous Concepts in an Aggregated Knowledge Base
Ghosheh et al. A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources
US20230178199A1 (en) Method and system of using hierarchical vectorisation for representation of healthcare data
Gupta et al. A novel deep similarity learning approach to electronic health records data
Mayya et al. Multi-channel, convolutional attention based neural model for automated diagnostic coding of unstructured patient discharge summaries
Kaswan et al. AI-based natural language processing for the generation of meaningful information electronic health record (EHR) data
Cao et al. Automatic ICD code assignment based on ICD’s hierarchy structure for Chinese electronic medical records
Wang et al. Learning fine-grained patient similarity with dynamic bayesian network embedded RNNs
Yang et al. Predicting discharge medications at admission time based on deep learning
US20230244869A1 (en) Systems and methods for classification of textual works
Moya-Carvajal et al. ML models for severity classification and length-of-stay forecasting in emergency units
Singla et al. Smart Healthcare System for Disease Prediction using Hybrid Machine Learning
Song Medical concept embedding with ontological representations
Jain Prediction of Length of Stay and Hospital Readmission for Diabetic Patients
Che Deep Learning Models For Temporal Data in Health Care
Tommola Automated Risk Prediction from Health Data
Tashkandi Intelligent Medical Decision Support for Predicting Patients at Risk in Intensive Care Units
Sundell Using pre-learned semantic representations of biomedical concepts for analyzing electronic medical records

Legal Events

Date Code Title Description
AS Assignment

Owner name: PETUUM INC, PENNSYLVANIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:XIE, PENGTAO;XING, ERIC;REEL/FRAME:047945/0839

Effective date: 20190109

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION