WO2022119922A2 - Prédiction de la signification d'instances d'abréviation - Google Patents

Prédiction de la signification d'instances d'abréviation Download PDF

Info

Publication number
WO2022119922A2
WO2022119922A2 PCT/US2021/061408 US2021061408W WO2022119922A2 WO 2022119922 A2 WO2022119922 A2 WO 2022119922A2 US 2021061408 W US2021061408 W US 2021061408W WO 2022119922 A2 WO2022119922 A2 WO 2022119922A2
Authority
WO
WIPO (PCT)
Prior art keywords
training
level
cluster
level cluster
inputs
Prior art date
Application number
PCT/US2021/061408
Other languages
English (en)
Other versions
WO2022119922A3 (fr
Inventor
Daqing HE
Zhendong Wang
Zhimeng LUO
Rebecca JACOBSON
Girish Ramseh CHAVAN
Eugene TSEYTLIN
Melissa CASTINE
Wei Wei
Original Assignee
University Of Pittsburgh - Of The Commonwealth System Of Higher Education
Upmc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Pittsburgh - Of The Commonwealth System Of Higher Education, Upmc filed Critical University Of Pittsburgh - Of The Commonwealth System Of Higher Education
Publication of WO2022119922A2 publication Critical patent/WO2022119922A2/fr
Publication of WO2022119922A3 publication Critical patent/WO2022119922A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning

Definitions

  • Neural networks are machine learning models that employ one or more layers of nonlinear units to predict an output for a received input.
  • Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer.
  • Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.
  • Medical documents can include abbreviations whose meaning is vague, ambiguous, or otherwise subject to multiple possible interpretations. For example, in one medical document corresponding to a first patient, the abbreviation “CP” might mean “cerebral palsy,” while in a different medical document corresponding to a second patient, the abbreviation “CP” might mean “chest pain.” So that safe and effective care can be provided to patients, it is important that the correct meaning of each abbreviation in each medical document be determined.
  • This specification describes a system implemented as computer programs on one or more computers in one or more locations that implements a machine learning model configured to process a model input characterizing an abbreviation instance and to generate a model output characterizing a predicted meaning of the abbreviation instance.
  • This specification also describes a system for training the machine learning model.
  • an “abbreviation” is a shortened form of a word or phrase.
  • Abbreviations can include acronyms; an acronym is a shortened form of a phrase that can be formed from the initial letters of the words in the phrase, or another subset of letters from the words in the phrase.
  • Multiple words or phrases can have the same abbreviation (e.g., “cerebral palsy” and “chest pain” can both have the abbreviation “CP”).
  • an “abbreviation instance” is a particular instance of an abbreviation in a text document.
  • the same abbreviation can have multiple instances within the same document and/or across multiple different documents.
  • the “semantic meaning” (or, simply, “meaning”) of an abbreviation instance is the word or phrase that the abbreviation instance stands for.
  • Different abbreviation instances corresponding to the same abbreviation can have different semantic meanings (e.g., a first instance of “CP” can have the semantic meaning of “chest pain” and a second instance of “CP” can have the semantic meaning of “cerebral palsy”).
  • a system can efficiently generate a diverse labeled training data set for training a machine learning model to predict the semantic meaning of abbreviation instances.
  • the system can leverage a clustering of training inputs in an unlabeled training data set to reduce the number of training inputs in the unlabeled training data set that have to be manually labeled by a user (typically a human). That is, the system can use i) a first subset of training inputs that have been labeled by a user and ii) the clustering of all the training inputs to automatically label a second subset of the training inputs (e.g., the remainder of the training inputs in the unlabeled training data set). As particular examples, the system can reduce the number of training inputs that need to be labeled by a user by lOx, lOOx, or lOOOx.
  • a user can label only 1/10 th , 1/100 th , or 1/1000 th of the unlabeled training data set, and the system can use the labeled subset to automatically label the remaining training inputs in the unlabeled training data set.
  • the system can significantly reduce the time and human effort required to generate the labeled training data set for training the machine learning model.
  • the system can cluster the training inputs using a parallelized clustering technique executed on a distributed computing system, further improving the efficiency with which the system can generate the labeled training data set.
  • the parallelized clustering technique can reduce the time required to cluster the training inputs by lOx, lOOx, or lOOOx.
  • the system can leverage a two-phase clustering process to increase the diversity of the labeled training inputs in the labeled training data set.
  • a system might generate labels for many training inputs that are all similar to each other, leading to a machine learning model that is not properly trained to handle model inputs outside of the limited distribution of training inputs that have been labeled.
  • a system can obtain labels for training inputs assigned to different clusters that, according to the two-phase clustering process, include training inputs that are dissimilar.
  • a wider range of training inputs can be included in the labeled training data set, allowing the trained machine learning model to be robust to a wider distribution of model inputs at inference time, including model inputs that are relatively rare.
  • a machine learning model trained using the techniques described in this specification can improve the health outcomes of patients whose medical records are processed using the machine learning model.
  • An abbreviation instance in a medical record that is misinterpreted can lead to deleterious health outcomes, including incorrect diagnoses and incorrect medical procedures.
  • a machine learning model trained as described herein can predict, with extremely high accuracy, the correct semantic meaning for each abbreviation in each medical record, and thus can ensure that safe and effective care is provided to each patient.
  • FIG. 1 is a diagram of an example training system.
  • FIG. 2 and FIG. 3 are diagrams of example clustering engines.
  • FIG. 4 is a flow diagram of an example process for generating a labeled training data set of training a machine learning model to predict the meaning of abbreviation instances.
  • This specification generally describes systems, methods, devices, and other techniques for training a machine learning model configured to predict the meaning of abbreviation instances.
  • FIG. 1 is a diagram of an example training system 100.
  • the training system 100 is an example of a system implemented as computer programs on one or more computers in one or more locations, in which the systems, components, and techniques described below can be implemented.
  • the training system 100 is configured to train a machine learning model configured to process a model input characterizing an abbreviation instance and to generate a model output characterizing a prediction of the meaning of the abbreviation instance.
  • the training system 100 receives as input N text documents 102a-n, and uses training examples generated from abbreviation instances within the text documents 102a-n to generate a set of trained model parameters 162 for the machine learning model. That is, the set of trained model parameters 162 can include a respective trained value for each parameter of the machine learning model.
  • Text documents 102a-n can represent the entirety of documents containing text, or just a portion of the entirety of documents containing text.
  • the training system 100 includes a tokenizer 110, an abbreviation detection system 120, an embedding engine 130, a clustering engine 140, a labeling engine 150, and a training engine 160.
  • the tokenizer 110 is configured to tokenize the N text documents 102a-n and to generate N corresponding tokenized documents 112a-n. That is, for each word in a document 102z, the tokenizer 110 generates a respective token in the corresponding tokenized document 112z.
  • Each token can include the corresponding word.
  • each token also includes a positional embedding that identifies the position of the word within the document. For example, the positional embedding can identify an index of the word within the document.
  • an embedding is an ordered collection of numeric values that represents an input in a particular embedding space.
  • an embedding can be a vector of floating point or other numeric values that has a fixed dimensionality.
  • the abbreviation detection system 120 is configured to process the N tokenized documents 112a-n and to identify each abbreviation instance 122 within the tokenized documents 112a-n. That is, the abbreviation detection system 120 is configured to identify each token that represents an abbreviation instance in the corresponding text document.
  • the abbreviation detection system 120 can compare each word represented by a token in the tokenized documents 112a-n against a set of “known” abbreviations, i.e., words that have been predetermined to be abbreviations. If the word matches one of the known abbreviations, then the abbreviation detection system 120 can determine that the token represents an abbreviation instance 122.
  • the abbreviation detection system 120 can compare each word represented by a token in the tokenized documents 112a-n against one or more regular expressions that represent respective classes of abbreviations. For example, the abbreviation detection system 120 can compare each token against the regular expression “[A-Z]+”, which represents any sequence of uppercase letters; the abbreviation detection system 120 can use this regular expression to detect acronyms. As another example, the abbreviation detection system 120 can compare each token against the regular expression “[A-Z ⁇ -_0-9#]+”. If the word represented by a token is an instance of one of the regular expressions, then the abbreviation detection system 120 can determine that the token represents an abbreviation instance 122.
  • the abbreviation detection system 120 can process each token using a machine learning model that is configured to process tokens to predict whether the tokens represent abbreviation instances.
  • the machine learning model can be a character-level recurrent neural network that is configured to process a sequence of network inputs each representing a respective character of a word and to generate a prediction, based on the sequence of inputs, of whether the word is an abbreviation instance.
  • the machine learning model can be configured to generate a likelihood value representing a likelihood that the word is an abbreviation instance (e.g., a scalar value between 0 and 1), and the abbreviation detection system 120 can determine that the corresponding token represents an abbreviation instance 122 if the likelihood value exceeds a predetermined threshold (e.g., 0.5, 0.75, or 0.9).
  • a predetermined threshold e.g., 0.5, 0.75, or 0.9.
  • the abbreviation detection system 120 can filter out one or more common abbreviations whose meanings do not need to be predicted; that is, even if an abbreviation instance matching the common abbreviation is detected, e.g., using a regular expression or a machine learning model as described above, the abbreviation detection system 120 determines not to include the abbreviation instance in the set of detected abbreviation instances 122.
  • the common abbreviations can include “Dr.”, “Mr.”, “Ms.”, “Mrs.”, “vs.”, “e.g.”, “i.e ”, “a.m ”, and/or “p.m ”.
  • the abbreviation detection system 120 can filter out abbreviation instances that match one or more “blacklisted” regular expressions. That is, even if the abbreviation instance is identified, e.g., using a process described above, if the abbreviation instance matches one of the blacklisted regular expressions, then the abbreviation detection system 120 determines not to include the abbreviation instance in the set of detected abbreviation instances 122.
  • the blacklisted regular expressions can include “[0-9a-z]+”, “[A-Z][a-z]*”, “[A- year-old”, “’[sS]”,
  • the embedding engine 130 is configured to generate a respective embedding for each detected abbreviation instance 122.
  • representations e.g., embeddings
  • the embeddings for each determined preceding and subsequent word are obtained from a library of pre-trained word embeddings. If one of the preceding or subsequent words is not represented in the library of pre-trained word embeddings (e.g., if the preceding or subsequence word is itself an abbreviation), the embedding engine 130 can determine not to include an embedding for that word in the bag-of-words embedding for the abbreviation instance 122.
  • the embedding engine 130 can combine the pre-trained word embeddings for the (ki + -j) other surrounding words to generate the embedding for the abbreviation instance 122. In some other implementations, the embedding engine 130 generates the embedding for each abbreviation instance 122 using an embedding neural network.
  • the embedding neural network can be a recurrent neural network, e.g., a long short-term memory (LSTM) neural network, that is configured to process an input sequence representing i) the ki preceding words in the corresponding text document, ii) the abbreviation instance 122, and ii) the k2 subsequent words in the corresponding text document to generate the embedding for the abbreviation instance 122.
  • LSTM long short-term memory
  • Each embedding of an abbreviation instance 122 generated by the embedding engine 130 can be included as a respective training input in an unlabeled training data set 132. That is, each embedding of an abbreviation instance can be used as a training input to the machine learning model during training of the machine learning model.
  • the training inputs in the unlabeled training data set 132 do not have corresponding “labels,” i.e., ground-truth model outputs identifying the meaning of the corresponding abbreviation instances 122.
  • the clustering engine 140 is configured to process the training inputs (corresponding to respective embeddings of abbreviation instances) in the unlabeled training data set 132 to generate a clustered training data set 142 that identifies, for each training input, one or more clusters to which the training input has been assigned.
  • Each cluster includes one or more training inputs that have been determined to be “similar” according to one or more criteria.
  • the clusters can be determined based on a distance (e.g., dissimilarity) between the training inputs in the respective clusters based on one or more criteria. For example, training inputs can be determined to be similar if a distance between their corresponding embeddings, according to a distance metric (e.g., cosine similarity), is small. Training inputs can be determined to be dissimilar if a distance between their corresponding embeddings, according to a distance metric, is large.
  • a distance metric e.g., cosine similarity
  • the clustering engine 140 executes a two-phase clustering process.
  • the clustering engine 140 clusters the training inputs in the unlabeled training data set 132 into multiple first-level clusters.
  • the clustering engine 140 clusters the first-level clusters into multiple second-level clusters. That is, each second-level cluster includes one or more first-level clusters generated in the first phase, where the one or more first-level clusters in a particular second-level cluster have been determined to be similar according to one or more criteria.
  • the clustering engine 140 can determine, for each first-level cluster, a “centroid” of the first-level cluster that represents the mean training example of the training examples assigned to the first-level cluster.
  • the clustering engine 140 can then generate the second-level clusters by clustering the determined centroids, e.g., according to the distances between respective pairs of centroids in the embedding space of the embeddings of the abbreviation instances. This process is discussed in more detail below with reference to FIG. 2.
  • the clustering engine 140 parallelizes the clustering process using a distributed computing system that includes multiple computing nodes. This process is discussed in more detail below with reference to FIG. 3.
  • one or more hyperparameters of the clustering process can be tuned during training of the machine learning model.
  • the training system 100 can generate a first set of trained model parameters 162 using a first set of hyperparameters for the clustering engine 140, and a second set of trained model parameters 162 using a different second set of hyperparameters for the clustering engine 140, and then determine an update to the values of the hyperparameters using a respective measure of the performance of the machine learning model when the machine learning model executes according to the two sets of trained model parameters 162 (e.g., using a computed prediction accuracy of the machine learning model).
  • a threshold for merging two clusters into a single cluster as described below with reference to FIG.
  • a threshold value for performing the star clustering technique described below with reference to FIG. 2 can be tuned during training of the machine learning model.
  • a radius parameter a for performing a DBSCAN clustering technique, as described below with reference to FIG. 2 can be tuned during training of the machine learning model.
  • the labeling engine 150 is configured to process the clustered training data set 142 and to generate a labeled training data set 152 by determining, for each of multiple training inputs in the clustered training data set 142, a label for the training input.
  • the label for each training input represents a ground-truth model output that the machine learning model should generate in response to processing the training input. That is, the label for each training input represents the true meaning of the abbreviation instance represented by the training input.
  • the labeling engine 150 determines a label for each training input in the clustered training data set 142. In some other implementations, the labeling engine 150 determines a label for only a subset of the training inputs in the clustered training data set 142. For example, if the labeling process requires user input and the clustered training data set 142 is very large, then it may be infeasible to obtain labels for each training input.
  • the labeling engine 150 can obtain the labels for the training inputs from a user of the labeling engine 150. That is, the user provides a user input that identifies, for one or more training inputs, the meaning of the abbreviation instances corresponding to the training inputs.
  • the labeling engine 150 can provide a prompt that identifies the one or more training inputs to the user, e.g., by displaying the prompt on a user device of the user.
  • the labeling engine 150 can display to the user the document 102a-n (or a portion of the document 102a-n) in which the abbreviation instance corresponding to the training input was detected.
  • the labeling engine 150 can also display to the user the one or more clusters to which the training input is assigned.
  • the user can provide the corresponding label to the labeling engine 150, e.g., using a graphical user interface of the user device.
  • the user can be a “domain expert,” i.e., the user can have knowledge about the domain of the abbreviations and thus be qualified to label the training inputs. For example, if the documents 102a-n are medical documents and the abbreviations are related to medical terms, the user can have experience working as a medical professional.
  • the labeling engine 150 can leverage the clusters of training inputs determined by the clustering engine 140 to make the labeling process more efficient.
  • the labeling engine 150 can use the clusters to determine a subset of the training inputs to present to the user, and use the labels for the training inputs in the subset provided by the user to determine labels for other training inputs that are not in the subset, e.g., each remaining training input in the clustered training data set 142 that is not in the subset.
  • the labeling engine 150 can provide a subset of the training inputs assigned to the cluster to the user for labeling. The labeling engine 150 can then determine whether each of the training inputs in the subset have the same label, i.e., whether the user determined that each of the training inputs in the subset have the same ground-truth output. If so, the labeling engine 150 can determine to automatically assign the same label to each of the remaining training inputs in the cluster that are not in the subset.
  • the labeling engine 150 can determine to automatically assign the same label to all of the training inputs in the cluster if the user assigns the same label to a threshold proportion of the training inputs in the cluster, e.g., 10%, 25%, 50%, 75%, or 90%.
  • the user can provide a user input to the labeling engine 150 instructing the labeling engine 150 to assign the same label to each training input in the cluster. That is, after reviewing and labeling a subset of the training inputs in a cluster, the user can determine that it is likely that the remaining training inputs in the cluster have the same label as the training inputs that the user already reviewed. The user can therefore instruct the labeling engine 150 to assign the same label to each training input in the cluster, and to move on and display training inputs corresponding to a different cluster.
  • the labeling engine 150 can leverage the second-level clusters to increase the diversity of the labeled training data set 152.
  • the labeling engine 150 can automatically determine to switch to labeling the training inputs in another first-level cluster that corresponds to a different second-level cluster than the particular first-level cluster. That is, the particular first-level cluster and the other first-level cluster were assigned to different second-level clusters. Because the training inputs in first -level clusters that have been assigned to the same second-level cluster have been determined to be similar, the labeling engine 150 can attempt to diversify the labeled training data set 152 by labeling first- level clusters that have been assigned to different second-level clusters and that therefore may include dissimilar training inputs.
  • the user can provide a user input to the labeling engine 150 instructing the labeling engine 150 to display training inputs corresponding to a different second- level cluster. That is, the user can determine that the training inputs corresponding to the particular second-level cluster that have been reviewed so far have been similar, and therefore instruct the labeling engine 150 to move on and display training inputs that correspond to a different second-level cluster and that therefore may be dissimilar to the training inputs reviewed so far.
  • the user can use the second-level clusters to ensure that all meanings of the abbreviation are adequately represented in the labeled training data set 152. Even if one of the meanings is more common than the others, it can be preferable for the labeled training data set 152 to be balanced and include multiple training inputs corresponding to each of the different meanings.
  • the user can determine that the labeled training data set 152 includes many training inputs corresponding to a first meaning of the abbreviation, but few training inputs corresponding to a second meaning of the abbreviation.
  • the user can then determine the particular first-level cluster to which the labeled training inputs corresponding to the second meaning of the abbreviation are assigned, and the particular second-level cluster to which the particular first-level cluster is assigned.
  • the user can then instruct the labeling engine 150 to display training inputs assigned to other first-level clusters assigned to the particular second-level cluster.
  • the user can attempt to increase the number of training inputs corresponding to the second meaning of the abbreviation by labeling training inputs in the same second-level cluster as the training inputs that have already been determined to have the second meaning.
  • the labeling engine 150 can determine to automatically assign the same label to all of the training inputs corresponding to a particular second-level cluster if the user assigns the same label to a threshold proportion of the training inputs corresponding to the particular second-level cluster.
  • the labeling engine 150 can determine to automatically assign the same label to each training input assigned to each first-level cluster in the particular second-level cluster if the user assigns the same label to each training label in a threshold proportion of the first-level clusters in the particular second-level cluster, e.g., 50%, 75%, or 90% of the first-level clusters in the particular second-level cluster.
  • the user can provide a user input to the labeling engine 150 instructing the labeling engine 150 to assign the same label to each training input in each first- level cluster assigned to the particular second-level cluster.
  • the training engine 160 is configured to process the labeled training data set 152 using the machine learning model and to generate the trained model parameters 162 for the machine learning model.
  • the machine learning model is configured to process the embedding of an abbreviation instance to generate a prediction of the meaning of the abbreviation instance.
  • the training engine 160 can generate the trained model parameters 162 using any appropriate supervised learning technique. For example, the training engine 160 can determine an error between i) the training output generated by the machine learning model in response to processing a training input and ii) the label corresponding to the training input, and use the determined error to update the parameters of the machine learning model. As a particular example, if the machine learning model is a neural network, then the training engine 160 can update the parameters of the neural network using backpropagation and stochastic gradient descent.
  • the machine learning model is a feedforward neural network that includes one or more feedforward neural network layers and that is configured to process the embedding of a particular abbreviation instance to generate a prediction of the meaning of the particular abbreviation instance.
  • the feedforward neural network can process the output of the final neural network layer in the feedforward neural network using a softmax function to generate, for each of multiple candidate meanings, a likelihood value that characterizes a likelihood that the particular abbreviation instance has the candidate meaning.
  • the feedforward neural network can then determine the predicted meaning of the particular abbreviation instance to be the candidate meaning with the highest likelihood value.
  • the machine learning model can be deployed onto an inference system.
  • the inference system can be configured to obtain new inputs to the machine learning model, corresponding to respective abbreviation instances identified in new documents, and to process the new inputs using the machine learning model to generate a respective predicted meaning for each abbreviation instance.
  • the machine learning model can be deployed onto a cloud system, i.e., a distributed computing system having multiple computing nodes, e.g., hundreds or thousands of computing nodes, in one or more locations.
  • the machine learning model can be deployed onto a user device, e.g., a mobile phone or laptop.
  • FIG. 2 is a diagram of an example clustering engine 200.
  • the clustering engine 200 is an example of a system implemented as computer programs on one or more computers in one or more locations, in which the systems, components, and techniques described below can be implemented.
  • the clustering engine 200 can be a component of a training system, e.g., the training system 100 depicted in FIG. 1, that is configured to train a machine learning model to predict the meaning of abbreviation instances.
  • the clustering engine 200 is configured to obtain training inputs 202 that each include an embedding of a different abbreviation instance, and to cluster the training inputs 202 to generate respective clustered training inputs 222 that each identify one or more clusters to which the corresponding training input 202 is assigned.
  • the clustering engine 200 executes a two-phase clustering process.
  • a first- level clustering engine 210 executes the first phase and a second-level clustering engine 220 executes the second phase.
  • the clustering engine 200 can then provide the clustered training inputs 222 to a labeling engine, e.g., the labeling engine 150 depicted in FIG. 1, that is configured to determine labels for the training inputs, e.g., according to user inputs provided by a user.
  • the first-level clustering engine 210 processes the training inputs 202 to generate a set of first-level clusters 212 that each include one or more training inputs 202.
  • the first-level clustering engine 210 can assign the training inputs 202 to the first-level clusters 212 according to the distance between respective pairs of embeddings of abbreviation instances in the embedding space of the embeddings.
  • the first-level clustering engine 210 can cluster the training inputs 202 according to a density-based clustering technique that does not predetermine the number of first- level clusters 212.
  • the first-level clustering engine 210 can cluster the training inputs 202 using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm.
  • DBSCAN Density-Based Spatial Clustering of Applications with Noise
  • Density-based clustering techniques can ensure that the training inputs assigned to the same first-level cluster are highly similar. Therefore, if multiple different training inputs assigned to a particular first-level cluster have the same label, then the labeling engine or the user can, with high confidence, predict that the other training inputs assigned to the particular first-level cluster have the same label, and automatically assign the other training inputs the same label as described above.
  • density-based clustering techniques can accurately identify different clusters that are not linearly separable.
  • the second-level clustering engine 220 processes the first-level clusters 212 to generate a set of second-level clusters that each include one or more first-level clusters 212. For each first- level cluster 212, the second-level clustering engine 220 can determine a centroid of the first- level cluster 212. For example, the second-level clustering engine 220 can compute the mean of the embeddings of abbreviation instances corresponding to the training inputs 202 assigned to the first-level cluster 212. The second-level clustering engine 220 can then cluster the centroids to generate the second-level clusters. For each centroid in a particular second-level cluster, the second-level clustering engine 220 can assign the corresponding first-level cluster 212 to the particular second-level cluster.
  • the second-level clustering engine 210 performs a “star” clustering technique to generate the second-level clusters. For each determined second-level cluster, one of the first-level clusters assigned to the second-level cluster is designated the “star” of the second-level cluster, while the other first-level clusters assigned to the second-level cluster are designated the “satellites” of the second-level cluster.
  • the labeling engine can prompt the user to label training examples in the respective “star” first -level cluster of each second-level cluster before prompting the user to label training examples in any “satellite” first-level clusters of respective second-level clusters.
  • the labeling engine can quickly diversify the labeled training inputs in a labeled training data set by obtaining labels for training inputs corresponding to dissimilar second-level clusters.
  • the labeling engine when prompting the user to label a particular training input, can display the first-level cluster and the second-level cluster to which the particular training input is assigned. In some such implementations, the labeling engine can allow the user to “navigate” to another first-level cluster within the same second-level cluster, or to another first-level cluster assigned to a different second-level cluster. That is, the labeling engine can allow the user to submit a user input instructing the labeling engine to display training examples assigned to the other first-level cluster. Therefore, the user can select the first-level cluster and/or the second-level cluster for which to determine labeled training inputs, e.g., in order to increase diversity of the labeled training data set or to increase efficiency of the labeling process.
  • FIG. 3 is a diagram of an example clustering engine 300.
  • the clustering engine 300 is an example of a system implemented as computer programs on one or more computers in one or more locations, in which the systems, components, and techniques described below can be implemented.
  • the clustering engine 300 can be a component of a training system, e.g., the training system 100 depicted in FIG. 1, that is configured to train a machine learning model to predict the meaning of abbreviation instances.
  • the clustering engine 300 is one of multiple clustering engines in the training system; for example, the clustering engine 300 can be the first-level clustering engine 210 or the second-level clustering engine 220 depicted in FIG. 2.
  • the clustering engine 300 is configured to obtain training inputs 302 that each include an embedding of a different abbreviation instance, and to cluster the training inputs 302 to generate respective clustered training inputs 332 that each identify one or more clusters to which the corresponding training input 302 is assigned.
  • the clustering engine 300 is configured to perform a parallelized clustering technique on a distributed computing system that includes multiple nodes.
  • the clustering engine 300 includes P worker nodes 320a-p and a master node 330 of the distributed computing system.
  • the clustering engine 300 also includes a distribution engine 310.
  • the distribution engine 310 is configured to distribute the training inputs 302 to respective worker nodes 320a-p. That is, the distribution engine 310 divides the set of training inputs 302 into P subsets 312a-p, and provides each subset 312a-p to a respective worker node 320a-p. Typically, each training input 302 belongs to exactly one subset 312a-p.
  • the distribution engine 310 determines the subsets 312a-p randomly. That is, the distribution engine 310 randomly assigns each training input 302 to a respective subset 312a-p. In these implementations, each subset 312a-p typically includes the same, or a similar, number of training inputs 302.
  • the distribution engine 310 assigns each training input 302 to subsets 312a-p according to a location of the embedding of the corresponding abbreviation instance in the embedding space. That is, training inputs 302 in the same subset 312a-p can be close to each other in the embedding space. For example, for a particular subset 312a-p, the distribution engine 310 can i) assign a first training input 302 to the particular subset 312a-p, and then ii) determine the /W training inputs 302 whose embeddings have the smallest distance to the embedding of the first training input 302 and assign the M determined training inputs 302 to the particular subset 312a-p. As another example, the distribution engine 310 can divide the embedding space into P sections, and assign each training input 302 whose embedding is in a particular section of the embedding space to the corresponding subset 312a-p.
  • the worker nodes 320a-p are each configured to process the respective subset 312a-p of training inputs to cluster the subset 312a-p of training inputs into one or more clusters. That is, each worker node 320a-p generates a respective clustered subset 312a-p of training inputs that identifies, for each training input in the subset, a respective cluster to which the training input is assigned.
  • the master node 330 is configured to obtain the P clustered subsets 312a-p of training inputs and to merge the clustered subsets 312a-p to generate the set of clustered training inputs 332.
  • Each training input 302 is represented exactly once in the clustered subsets 322a-p, and each training input 302 is represented exactly once in the set of clustered training inputs 332.
  • the master node 330 can compare the pair of clusters against one or more similarity criteria. If the pair of clusters meet the one or more similarity criteria, then the master node 330 can determine to merge the pair of clusters into a single cluster, i.e., to assign the training inputs in the pair of clusters to the same cluster in the set of clustered training inputs 332. That is, for each pair of clustered subsets 312a-p, the master node 330 can compare each cluster represented in a first clustered subset 312a-p of the pair with each cluster represented in a second clustered subset 312a-p of the pair.
  • the master node 330 can determine, for each cluster of training inputs 302 represented by a clustered subset 322a-p, a centroid of the cluster. Then, when comparing a pair of clusters corresponding to respective clustered subsets 322a-p, the master node 330 can compare the corresponding pair of centroids. For example, the master node 330 can determine to merge the pair of clusters if a distance between the corresponding pair of centroids in the embedding space is below a predetermined threshold according to a distance metric (e.g., cosine similarity).
  • a distance metric e.g., cosine similarity
  • the mater node 330 can compare a respective single point (corresponding to a particular training input 302) from each cluster. For example, the master node 330 can randomly sample a single point from each cluster. As another example, the master node 330 can determine, for each cluster, the point in the cluster that is closest to the centroid of the other cluster. As another example, the master node 330 can determine the pair of points that have the smallest distance, out of each pair of points including a first point from the first cluster of the pair of clusters and a second point from the second cluster of the pair of clusters. The master node 330 can then determine to merge the pair of clusters if a distance between the selected pair of points in the embedding space is below a predetermined threshold according to a distance metric (e.g., cosine similarity).
  • a distance metric e.g., cosine similarity
  • the master node 330 can compute a new centroid (or sample a new point) for the merged cluster, and use the new centroid to compare the merged cluster with the other clusters represented in respective clustered subsets 322a-p.
  • the clustering engine 300 can provide the set of clustered training inputs 332 to a labeling engine, e.g., the labeling engine 150 depicted in FIG. 1, that is configured to use the clusters of the training inputs 302 to determine labels for the training inputs 302.
  • a labeling engine e.g., the labeling engine 150 depicted in FIG. 1, that is configured to use the clusters of the training inputs 302 to determine labels for the training inputs 302.
  • the clustering engine 300 can cluster a set of centroids each corresponding to a respective first-level cluster using the parallelized clustering technique to generate a set of second-level clusters; for example, the clustering engine can be the second-level clustering engine 220 described above with reference to FIG. 2.
  • FIG. 4 is a flow diagram of an example process 400 for generating a labeled training data set of training a machine learning model to predict the meaning of abbreviation instances.
  • the process 400 will be described as being performed by a system of one or more computers located in one or more locations.
  • a training system e.g., the training system 100 depicted in FIG. 1, appropriately programmed in accordance with this specification, can perform the process 400.
  • the system obtains training inputs for the machine learning model (step 402).
  • Each training input represents an instance of a respective abbreviation.
  • the system can obtain multiple text documents, identify one or more abbreviation instances in each text document, and generate a respective training input for each identified abbreviation instance.
  • the system determines multiple first-level clusters of the training inputs (step 404). Each first-level cluster can include one or more training inputs.
  • the system determines multiple second-level clusters of the training inputs (step 406).
  • Each second-level cluster can include each of the training inputs that have been assigned to one or more particular first-level clusters. For example, the system can assign each first-level cluster to a single second-level cluster, and assign the training inputs of the first-level cluster to the second-level cluster.
  • the system obtains a single training label corresponding to a subset of the training inputs assigned to a particular first-level cluster (step 408). That is, each training input in the subset has the same single training label.
  • the training label identifies a ground-truth semantic meaning for the respective abbreviation represented by each training input in the subset.
  • the system assigns the single training label to the remainder of the training inputs in the particular first-level cluster (step 410). That is, for each training input that is not in the subset, the system assigns the same single training label to the training input.
  • the system obtains training labels for a second particular first-level cluster corresponding to a different second-level cluster than the particular first level cluster (step 412).
  • the second particular first-level cluster can be dissimilar to the particular first-level cluster because they belong to respective different second-level clusters. Therefore, obtaining training labels for the second particular first-level cluster can increase the diversity of the labeled training data set.
  • a training system After generating a labeled training data set according to the process 400, a training system can train the machine learning model using the labeled training data set.
  • an inference system can use the machine learning model to predict the semantic meaning of abbreviation instances.
  • the machine learning model can process an input representing one or more abbreviation instances, and generate an output indicating the semantic meaning(s) of the one or more abbreviation instances (e.g., scores indicating the likelihood that the abbreviation instance represents each of a set of multiple possible semantic meanings).
  • One or more most-likely semantic meanings for the input abbreviation instance can be selected based on the scores.
  • Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory storage medium for execution by, or to control the operation of, data processing apparatus.
  • the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • data processing apparatus refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
  • the apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • the apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
  • a computer program which may also be referred to or described as a program, software, a software application, an app, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages; and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a program may, but need not, correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub programs, or portions of code.
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.
  • the term “database” is used broadly to refer to any collection of data: the data does not need to be structured in any particular way, or structured at all, and it can be stored on storage devices in one or more locations.
  • the index database can include multiple collections of data, each of which may be organized and accessed differently.
  • engine is used broadly to refer to a softwarebased system, subsystem, or process that is programmed to perform one or more specific functions.
  • an engine will be implemented as one or more software modules or components, installed on one or more computers in one or more locations. In some cases, one or more computers will be dedicated to a particular engine; in other cases, multiple engines can be installed and running on the same computer or computers.
  • the processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA or an ASIC, or by a combination of special purpose logic circuitry and one or more programmed computers.
  • Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.
  • a central processing unit will receive instructions and data from a read only memory or a random access memory or both.
  • the essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
  • the central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
  • PDA personal digital assistant
  • GPS Global Positioning System
  • USB universal serial bus
  • Computer readable media suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks.
  • semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
  • magnetic disks e.g., internal hard disks or removable disks
  • magneto optical disks e.g., CD ROM and DVD-ROM disks.
  • embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s device in response to requests received from the web browser.
  • a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone that is running a messaging application, and receiving responsive messages from the user in return.
  • Data processing apparatus for implementing machine learning models can also include, for example, special-purpose hardware accelerator units for processing common and computeintensive parts of machine learning training or production, i.e., inference, workloads.
  • Machine learning models can be implemented and deployed using a machine learning framework, .e.g., a TensorFlow framework, a Microsoft Cognitive Toolkit framework, an Apache Singa framework, or an Apache MXNet framework.
  • a machine learning framework .e.g., a TensorFlow framework, a Microsoft Cognitive Toolkit framework, an Apache Singa framework, or an Apache MXNet framework.
  • Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface, a web browser, or an app through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
  • LAN local area network
  • WAN wide area network
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the device, which acts as a client.
  • Data generated at the user device e.g., a result of the user interaction, can be received at the server from the device.
  • Embodiment 1 is a method comprising: obtaining a plurality of training inputs for a machine learning model configured to predict semantic meanings of abbreviation instances, wherein each training input represents an instance of a respective abbreviation; determining a plurality of first-level clusters of the training inputs, wherein each first- level cluster comprises a respective first plurality of training inputs; determining a plurality of second-level clusters of the training inputs, wherein each second-level cluster comprises the respective training inputs of one or more first-level clusters, wherein at least one second-level cluster comprises the respective training inputs of a plurality of first-level clusters; obtaining a single training label corresponding to a subset of the respective first plurality of training inputs of a particular first-level cluster, wherein the single training label identifies a ground-truth semantic meaning for the respective abbreviation instance represented by each training input in the subset; assigning the single training label to a remainder of the training inputs in the first plurality of training input
  • Embodiment 2 is the method of embodiment 1, further comprising training the machine learning model using the plurality of training inputs and the obtained training labels.
  • Embodiment 3 is the method of any one of embodiments 1 or 2, further comprising: obtaining a plurality of documents; identifying, for each document, one or more abbreviation instances in the document; and generating, for each identified abbreviation instance, a training input.
  • Embodiment 4 is the method of embodiment 3, wherein generating a training input for an identified abbreviation instance comprises: identifying a plurality of context words that each precede or follow the identified abbreviation instance in the corresponding document; obtaining a respective embedding for each identified context word; and processing the embeddings for the plurality of context words to generate the training input.
  • Embodiment 5 is the method of any one of embodiments 1-4, wherein assigning the single training label to the remainder of the training inputs in the first plurality of training inputs of the particular first-level cluster comprises: obtaining a first user input comprising instructions to assign the single training label to the remainder of the training inputs.
  • Embodiment 6 is the method of any one of embodiments 1-5, wherein the single training label is assigned to the remainder of the training inputs when a number of training inputs in the subset of the first plurality of training inputs of the particular first-level cluster is determined to satisfy a criterion.
  • Embodiment 7 is the method of any one of embodiments 1-6, wherein obtaining training labels for a second particular first-level cluster corresponding to a different second-level cluster than the particular first-level cluster comprises: obtaining a second user input comprising instructions to obtain training labels for the second particular first-level cluster.
  • Embodiment 8 is the method of any one of embodiments 1-7, wherein determining the plurality of first-level clusters comprises determining the plurality of first-level clusters using a distributed computing system comprising a master node and a plurality of worker nodes.
  • Embodiment 9 is the method of embodiment 8, wherein determining the plurality of first- level clusters using a distributed computing system comprising a master node and a plurality of worker nodes comprises: distributing a respective subset of the plurality of training inputs to each of the plurality of worker nodes; determining, by each worker node and in parallel across the worker nodes, a plurality of candidate first-level clusters using the respective subset of the plurality of training inputs; obtaining, by the master node, data representing the respective plurality of candidate first- level clusters corresponding to each worker node; and determining the plurality of first-level clusters using the respective plurality of candidate first-level clusters corresponding to each worker node.
  • Embodiment 10 is the method of embodiment 9, wherein determining the plurality of first-level clusters using the respective plurality of candidate first-level clusters corresponding to each worker node comprises: for each first candidate first-level cluster corresponding to a first worker node and each second candidate first-level cluster corresponding to a different second worker node: determining a difference between the first candidate first-level cluster and the second candidate first -level cluster; and determining whether to combine the first candidate first-level cluster and the second candidate first -level cluster according to the determined difference.
  • Embodiment 11 is the method of embodiment 10, wherein: determining a difference between the first candidate first-level cluster and the second candidate first-level cluster comprises: determining a first centroid representing a measure of central tendency of the training inputs in the first candidate first-level cluster; determining a second centroid representing a measure of central tendency of the training inputs in the second candidate first-level cluster; and determining a distance between the first centroid and the second centroid; and determining whether to combine the first candidate first-level cluster and the second candidate first-level cluster comprises determining whether the distance between the first centroid and the second centroid satisfies a threshold.
  • Embodiment 12 is the method of any one of embodiments 1-11, wherein determining a plurality of second-level clusters of the training inputs comprises: determining, for each first-level cluster, a respective centroid representing a measure of central tendency of the first plurality of training inputs of the first level cluster; and clustering the respective centroids of the plurality of first-level clusters.
  • Embodiment 13 is the method of any one of embodiments 1-12, wherein at least one second-level cluster comprises the respective training inputs of multiple first-level clusters.
  • Embodiment 14 is a system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform the method of any one of embodiments 1 to 13.
  • Embodiment 15 is a computer storage medium encoded with a computer program, the program comprising instructions that are operable, when executed by data processing apparatus, to cause the data processing apparatus to perform the method of any one of embodiments 1 to 13.

Abstract

Procédés, systèmes et appareil, comprenant des programmes informatiques codés sur un support de stockage informatique permettant de prédire la signification d'instances d'abréviations. L'un des procédés comprend l'obtention d'entrées d'entraînement pour un modèle d'apprentissage automatique conçu pour prédire des significations sémantiques d'instances d'abréviation; la détermination d'une pluralité de groupes de premier niveau d'entrées d'entraînement; la détermination d'une pluralité de groupes de second niveau d'entrées d'entraînement, chaque groupe de second niveau comprenant les entrées d'entraînement respectives d'un ou de plusieurs groupes de premier niveau; l'obtention d'une étiquette d'enregistrement unique correspondant à un sous-ensemble de la première pluralité d'entrées d'entraînement respectives d'un groupe de premier niveau particulier; l'attribution de l'étiquette d'entraînement unique au reste des entrées d'entraînement dans la première pluralité d'entrées d'entraînement du groupe de premier niveau particulier; et l'obtention d'étiquettes d'entraînement pour un second groupe de premier niveau particulier correspondant à un groupe de second niveau différent du groupe de premier niveau particulier.
PCT/US2021/061408 2020-12-04 2021-12-01 Prédiction de la signification d'instances d'abréviation WO2022119922A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202063121833P 2020-12-04 2020-12-04
US63/121,833 2020-12-04
US202163158597P 2021-03-09 2021-03-09
US63/158,597 2021-03-09

Publications (2)

Publication Number Publication Date
WO2022119922A2 true WO2022119922A2 (fr) 2022-06-09
WO2022119922A3 WO2022119922A3 (fr) 2022-08-18

Family

ID=81854917

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/061408 WO2022119922A2 (fr) 2020-12-04 2021-12-01 Prédiction de la signification d'instances d'abréviation

Country Status (1)

Country Link
WO (1) WO2022119922A2 (fr)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8370361B2 (en) * 2011-01-17 2013-02-05 Lnx Research, Llc Extracting and normalizing organization names from text
US10839285B2 (en) * 2017-04-10 2020-11-17 International Business Machines Corporation Local abbreviation expansion through context correlation

Also Published As

Publication number Publication date
WO2022119922A3 (fr) 2022-08-18

Similar Documents

Publication Publication Date Title
US11087201B2 (en) Neural architecture search using a performance prediction neural network
US11443170B2 (en) Semi-supervised training of neural networks
US11868888B1 (en) Training a document classification neural network
US10083169B1 (en) Topic-based sequence modeling neural networks
US9037464B1 (en) Computing numeric representations of words in a high-dimensional space
US20180189950A1 (en) Generating structured output predictions using neural networks
JP2019537809A (ja) ポインタセンチネル混合アーキテクチャ
CN111539197B (zh) 文本匹配方法和装置以及计算机系统和可读存储介质
US20150178383A1 (en) Classifying Data Objects
US20220075944A1 (en) Learning to extract entities from conversations with neural networks
EP3411835B1 (fr) Augmentation des réseaux neuronals par mémoire hiérarchique externe
EP3100212A1 (fr) Génération de représentations vectorielles de documents
US20230049747A1 (en) Training machine learning models using teacher annealing
EP3314461A1 (fr) Intégrations d'entités et de mots d'apprentissage pour la désambiguïsation d'entités
WO2018093935A1 (fr) Formation de réseaux neuronaux utilisant une perte de regroupement
WO2018039510A1 (fr) Entraînement de modèle augmenté de récompense
EP3779728A1 (fr) Dispositif de prédiction de phénomène, dispositif de génération de modèle de prédiction et programme de prédiction de phénomène
US20210004689A1 (en) Training neural networks using posterior sharpening
US20220230061A1 (en) Modality adaptive information retrieval
CN110678882A (zh) 使用机器学习从电子文档选择回答跨距
US11409963B1 (en) Generating concepts from text reports
US20230205994A1 (en) Performing machine learning tasks using instruction-tuned neural networks
US20200104686A1 (en) Decreasing neural network inference times using softmax approximation
WO2016210203A1 (fr) Intégrations d'entités et de mots d'apprentissage pour la désambiguïsation d'entités
US20230244934A1 (en) Augmenting machine learning language models using search engine results

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21901382

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 21901382

Country of ref document: EP

Kind code of ref document: A2