WO2022091413A1 - 機械学習プログラム、推定プログラム、装置、及び方法 - Google Patents

機械学習プログラム、推定プログラム、装置、及び方法 Download PDF

Info

Publication number
WO2022091413A1
WO2022091413A1 PCT/JP2020/041077 JP2020041077W WO2022091413A1 WO 2022091413 A1 WO2022091413 A1 WO 2022091413A1 JP 2020041077 W JP2020041077 W JP 2020041077W WO 2022091413 A1 WO2022091413 A1 WO 2022091413A1
Authority
WO
WIPO (PCT)
Prior art keywords
machine learning
ontology
embedded vector
graph data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/041077
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
孝典 鵜飼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to PCT/JP2020/041077 priority Critical patent/WO2022091413A1/ja
Priority to EP20959928.1A priority patent/EP4239535A4/en
Priority to JP2022558810A priority patent/JP7444280B2/ja
Publication of WO2022091413A1 publication Critical patent/WO2022091413A1/ja
Priority to US18/302,084 priority patent/US20230259828A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Definitions

  • the disclosed technology is related to machine learning technology.
  • a system has been proposed that calculates the similarity between drugs and estimates the side effects of the designated drug.
  • This system has a similarity calculator and a side effect determination device.
  • the similarity calculator obtains data related to a drug set from a plurality of open data sources, generates an RDF (Resource Description Framework) triple, and stores an RDF graph of the RDF triple.
  • the similarity calculator calculates the similarity between each drug and all other drugs by generating feature vectors for each drug based on RDF triples and comparing the feature vectors.
  • the side effect determination device estimates the side effect of the designated drug based on the similarity of the drug.
  • the accuracy of estimating side effects may not be sufficient only by the similarity of drugs (drugs) comparing feature vectors.
  • drugs drugs
  • even patients receiving the same drug may have different side effects if the patients are affected by different diseases.
  • the above situation is not limited to the case of estimating side effects based on the similarity of medicines, but may occur when some event is estimated using a machine learning model in which machine learning is executed using past cases as training data.
  • the disclosed technique aims to train machine learning models to improve the accuracy of event estimation.
  • the disclosed technique inputs training data including an embedded vector of graph data, an embedded vector of ontology, and a correct label.
  • the disclosed technique also performs machine learning of machine learning models based on the loss function.
  • the loss function combines the value of the activation function calculated only by the embedded vector of the graph data in the input training data and the value of the activation function calculated only by the embedded vector of the ontology in the training data. It is calculated by the obtained value and the correct answer label.
  • the machine learning model can be trained to improve the estimation accuracy of the event.
  • the past case data is used.
  • the case data shall include information on the patient's attributes, the drug administered, the disease the patient is suffering from, and the like.
  • the ontology is a systematization of background knowledge in the target field, and in the case of this embodiment, for example, information such as disease similarity and relationship, drug similarity and contained components is used as a tree. It is organized in the form of structure, etc. Similar side effects can occur if the disease is similar or if a drug containing the same ingredients is administered. Therefore, it is considered that such a possibility can be estimated by using a feature vector containing the above-mentioned ontology information as an element.
  • the method transforms case data into graph data consisting of nodes and edges connecting the nodes, and combines this graph data with a tree-structured ontology. Then, the method calculates an embedded vector representing each node from the graph data in which the case data and the ontology are combined. Further, the method is a method of training a machine learning model using a feature vector generated from this embedded vector as training data.
  • the ontology information is appropriately reflected in the machine learning of the machine learning model.
  • the machine learning system includes a machine learning device 10 and an estimation device 30.
  • the machine learning device 10 will be described.
  • the case data for machine learning is data including information on the attributes of the patient, the drug administered, the disease affected by the patient, and the side effects.
  • FIG. 2 shows an example of case data for machine learning.
  • information on "ID”, “gender”, “age”, “weight”, “height”, “pharmaceutical product”, “disease”, and “side effect” is included for each patient.
  • the “ID” is patient identification information.
  • "Gender,”"age,”"weight,” and “height” are examples of patient attributes.
  • “Pharmaceutical” is the name of a drug given to a patient.
  • Disease is the name of the underlying disease that the patient is suffering from.
  • the “side effect” is information on the side effect that occurred when the drug shown in “pharmaceutical product” was administered.
  • FIG. 3 shows an example of an ontology.
  • the drug ontology is between a node indicating a drug (a circle with a drug name inside), a node showing background knowledge (an ellipse with background knowledge inside), and related nodes. It is the information of the tree structure including the edge (arrow) connecting the. The edge may be associated with relevant information indicating how the drug and background knowledge are related.
  • a node indicating the drug and a node indicating a severe infection are connected at an edge, and related information prohibiting administration (“Contraindications” in FIG. 3). ”) Is given.
  • a node indicating a disease a circle with a disease name inside
  • a node showing background knowledge an ellipse with background knowledge inside
  • an edge connecting related nodes a circle.
  • Information on the tree structure including the arrow For example, when a disease called alcohol intake is classified as a mental illness, a node indicating alcohol intake and a node indicating mental illness are connected by an edge, and related information such as "classification" is given to the edge.
  • the machine learning device 10 includes a graph generation unit 12, an embedded vector calculation unit 14, a training data generation unit 16, and a machine learning unit 18.
  • the graph generation unit 12 acquires machine learning case data input to the machine learning device 10, and generates graph data composed of nodes and edges connecting the nodes from the acquired machine learning case data. ..
  • the graph generation unit 12 generates each value of each item other than the side effect included in the machine learning case data as a node.
  • the node indicated by the circle in which each value is indicated is a node indicating each of the attribute, the drug, and the disease.
  • the graph generation unit 12 connects an edge from each "ID" node to a node indicating each of the attributes, medicines, and diseases of the patient indicated by the ID.
  • FIG. 4 the graph generation unit 12 acquires machine learning case data input to the machine learning device 10, and generates graph data composed of nodes and edges connecting the nodes from the acquired machine learning case data. ..
  • the graph generation unit 12 generates each value of each item other than the side effect included in the machine learning case data as a node.
  • the node indicated by the circle in which each value is indicated is a node
  • the node showing the side effect (the node indicated by the rounded square in which the side effect is indicated) and the node of "ID" and the side effect are shown.
  • the edge connecting to the indicated node is also shown.
  • the method of generating graph data is not limited to the above example, and other methods may be adopted.
  • the graph data generated from the case data is referred to as "case graph data". In the following description, the case graph data does not include the node showing the side effect.
  • the graph generation unit 12 generates graph data in which an ontology is connected to the case graph data based on the case data for machine learning. Specifically, the graph generation unit 12 connects the case graph data and the ontology by sharing the nodes that match the case graph data and the ontology. For example, the graph generation unit 12 searches for a node that matches a node indicating "pharmaceutical product" and "disease" contained in the case graph data from the drug ontology and the disease ontology, and searches for the searched node and a portion connected to the node. Extract.
  • the graph generation unit 12 connects the portion extracted from the ontology to the case graph data so as to superimpose the nodes indicating the matching “pharmaceutical products” or “disease” as shown by the broken line in FIG. ..
  • the graph data in which the part extracted from the ontology is connected to the case graph data is referred to as "overall graph data”.
  • the embedded vector calculation unit 14 calculates an embedded vector representing each node included in the overall graph data based on the overall graph data. Specifically, the embedded vector calculation unit 14 calculates the embedded vector by mapping each of the nodes and edges included in the overall graph data to an n-dimensional vector space. More specifically, as shown in the upper figure of FIG. 6, an embedded vector is taken as an example of graph data including nodes A, B, C, an edge r between nodes A and B, and an edge r between nodes C and B. The calculation of the embedded vector by the calculation unit 14 will be described. Here, for the sake of simplicity, a case of mapping to a two-dimensional vector space will be described.
  • the embedded vector calculation unit 14 arranges each of the nodes and edges included in the graph data in the vector space as an initial value vector, as shown in the middle diagram of FIG. Then, the embedded vector calculation unit 14 optimizes the arrangement of each vector so as to express the connection relationship of the nodes.
  • the vector A + the vector r is close to the vector B
  • the vector C + the vector r is close to the vector B.
  • the optimized vector becomes the embedded vector of the node indicated by the vector.
  • the embedded vector calculation unit 14 calculates the embedded vector for each node included in the overall graph data by the calculation method as described above.
  • the training data generation unit 16 generates training data to be used for machine learning of a machine learning model by using the embedding vector calculated by the embedding vector calculation unit 14 and the correct answer label generated from the side effect information. Specifically, the training data generation unit 16 concatenates the vector values of the embedded vectors calculated for each node connected to each node of the "ID" included in the overall graph data to generate the identity. .. Then, the training data generation unit 16 indicates a correct label indicating "TRUE" when the target side effect has occurred, and "FALSE" when the target side effect has not occurred, based on the information on the side effect. Generate a correct label and add it to the identity to generate training data.
  • FIG. 7 shows an example of training data.
  • the predisposition is the concatenation of the embedded vectors of the nodes of the case graph data (hereinafter referred to as “case data identity”).
  • case data identity the identity that concatenate the embedded vectors of the drug ontology nodes
  • disease identities the identities that concatenate the embedded vectors of the disease ontology nodes
  • the embedded vector of the node (node indicating the item "drug” and “disease” of the case data) common to the case graph data and the ontology is used for both the identity of the case data and the identity of the drug or the identity of the disease.
  • node indicating the item "drug” and “disease” of the case data
  • the embedded vector of the node common to the case graph data and the ontology is used for both the identity of the case data and the identity of the drug or the identity of the disease.
  • the machine learning unit 18 updates the parameters of the machine learning model 20 configured by, for example, a neural network, using the training data generated by the training data generation unit 16.
  • FIG. 8 schematically shows the network configuration of the machine learning model 20.
  • the machine learning model 20 includes a first hidden layer, a second hidden layer, a third hidden layer, and a fourth hidden layer.
  • the identity of the case data is input to the first hidden layer
  • the identity of the drug is input to the second hidden layer
  • the identity of the disease is input to the third hidden layer.
  • the output from each of the first hidden layer, the second hidden layer, and the third hidden layer and all the identities included in the training data are input to the fourth hidden layer.
  • the machine learning model 20 outputs the probability that the side effect of the target occurs based on the output from the fourth hidden layer.
  • the machine learning unit 18 updates the parameters of the machine learning model 20 of the network configuration as described above so as to minimize the value LOSS of the loss function shown below.
  • G (A, B) is a loss function of A and B, for example, a function for calculating a square sum error, a cross entropy error, and the like.
  • Label is a function that returns 1 when the correct label is TRUE and 0 when the correct label is FALSE.
  • Output is an output value when the input of the training data is input to the machine learning model 20.
  • T is a vector consisting of the features of the case data among the features included in the training data.
  • O1 is a vector consisting of the features of the drug among the features included in the training data.
  • O2 is a vector consisting of disease features among the features included in the training data.
  • f1 is an activation function corresponding to the first hidden layer
  • f2 is an activation function corresponding to the second hidden layer
  • f3 is an activation function corresponding to the third hidden layer.
  • This activation function is, for example, ReLU (Rectified Linear Unit). That is, f1 (T) is the value of the activation function calculated only by the embedded vector of the node of the case graph data among the input training data. Further, f2 (O1) is the value of the activation function calculated only by the embedded vector of the node of the drug ontology in the input training data. Further, f3 (O2) is the value of the activation function calculated only by the embedded vector of the node of the disease ontology in the input training data.
  • ReLU Rectified Linear Unit
  • f4 is an activation function corresponding to the fourth hidden layer, and is, for example, a sigmoid function. That is, f4 (T, O1, O2, f1 (T), f2 (O1), f3 (O2)) is active in a vector that combines all the identities and the outputs from each of the first to third hidden layers. It is the value to which the conversion function is applied.
  • the machine learning unit 18 reaches the predetermined number of repetitions of machine learning when the value LOSS of the above loss function is equal to or less than a predetermined threshold value and when the difference from the previously calculated LOSS is equal to or less than a predetermined value. In such cases, it is determined that the value LOSS of the loss function has been minimized. When the machine learning unit 18 determines that the value LOSS of the loss function is minimized, the machine learning is terminated, and the machine learning model 20 including the network configuration information and the parameter values at the time when the machine learning is completed is output. do.
  • the estimation device 30 is input with estimation target case data and ontology, which are case data whose correct answer is unknown, which are targets for estimating side effects.
  • the estimation target case data is case data excluding the item of "side effect" from the case data for machine learning.
  • the estimation device 30 includes a graph generation unit 32, an embedded vector calculation unit 34, and an estimation unit 36, as shown in FIG. Further, the machine learning model 20 output from the machine learning device 10 is stored in a predetermined storage area of the estimation device 30.
  • the graph generation unit 32 is the same as the graph generation unit 12 of the machine learning device 10 except that the data that is the source of generating the graph data is not the machine learning case data but the estimation target case data. Further, the embedded vector calculation unit 34 is also the same as the embedded vector calculation unit 14 of the machine learning device 10.
  • the estimation unit 36 has a vector value of an embedded vector calculated by the embedded vector calculation unit 34 for each node of "ID" included in the whole graph data generated by the graph generation unit 32 and for each node connected to the node. To generate a node by concatenating.
  • the generated traits include each of the traits of the case data, the traits of the drug, and the traits of the disease, as well as the traits included in the training data generated by the training data generation unit 16 of the machine learning device 10.
  • the estimation unit 36 outputs an estimation result indicating whether or not a side effect of the target occurs for the estimation target case data. For example, as shown in FIG.
  • the estimation unit 36 inputs the identities generated from the estimation target case data for each patient whose “ID” is C and D into the machine learning model 20, and side effects of the target occur. Get the probability.
  • the estimation unit 36 outputs TRUE when the acquired probability is equal to or higher than a predetermined value, and outputs FALSE when the acquired probability is less than a predetermined value.
  • the estimation unit 36 may output the probability output from the machine learning model 20 as it is as an estimation result.
  • the machine learning device 10 can be realized by, for example, the computer 40 shown in FIG.
  • the computer 40 includes a CPU (Central Processing Unit) 41, a memory 42 as a temporary storage area, and a non-volatile storage unit 43. Further, the computer 40 includes an input / output device 44 such as an input unit and a display unit, and an R / W (Read / Write) unit 45 that controls reading and writing of data to the storage medium 49. Further, the computer 40 includes a communication I / F (Interface) 46 connected to a network such as the Internet.
  • the CPU 41, the memory 42, the storage unit 43, the input / output device 44, the R / W unit 45, and the communication I / F 46 are connected to each other via the bus 47.
  • the storage unit 43 can be realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like.
  • a machine learning program 50 for making the computer 40 function as a machine learning device 10 is stored in the storage unit 43 as a storage medium.
  • the machine learning program 50 includes a graph generation process 52, an embedded vector calculation process 54, a training data generation process 56, and a machine learning process 58.
  • the CPU 41 reads the machine learning program 50 from the storage unit 43, expands it into the memory 42, and sequentially executes the processes of the machine learning program 50.
  • the CPU 41 operates as the graph generation unit 12 shown in FIG. 1 by executing the graph generation process 52. Further, the CPU 41 operates as the embedded vector calculation unit 14 shown in FIG. 1 by executing the embedded vector calculation process 54. Further, the CPU 41 operates as the training data generation unit 16 shown in FIG. 1 by executing the training data generation process 56. Further, the CPU 41 operates as the machine learning unit 18 shown in FIG. 1 by executing the machine learning process 58.
  • the computer 40 that has executed the machine learning program 50 functions as the machine learning device 10.
  • the CPU 41 that executes the program is hardware.
  • the estimation device 30 can be realized by, for example, the computer 60 shown in FIG.
  • the computer 60 includes a CPU 61, a memory 62, a storage unit 63, an input / output device 64, an R / W unit 65, and a communication I / F 66.
  • the CPU 61, the memory 62, the storage unit 63, the input / output device 64, the R / W unit 65, and the communication I / F 66 are connected to each other via the bus 67.
  • the storage unit 63 can be realized by an HDD, SSD, flash memory, or the like.
  • the storage unit 63 as a storage medium stores an estimation program 70 for causing the computer 60 to function as the estimation device 30.
  • the estimation program 70 includes a graph generation process 72, an embedded vector calculation process 74, and an estimation process 76. Further, the storage unit 63 has an information storage area 80 in which information constituting the machine learning model 20 that has been machine-learned is stored.
  • the CPU 61 reads the estimation program 70 from the storage unit 63, expands the estimation program 70 into the memory 62, and sequentially executes the processes of the estimation program 70.
  • the CPU 61 operates as the graph generation unit 32 shown in FIG. 9 by executing the graph generation process 72. Further, the CPU 61 operates as the embedded vector calculation unit 34 shown in FIG. 9 by executing the embedded vector calculation process 74. Further, the CPU 61 operates as the estimation unit 36 shown in FIG. 9 by executing the estimation process 76. Further, the CPU 61 reads information from the information storage area 80 and expands the machine learning model 20 into the memory 62. As a result, the computer 60 that has executed the estimation program 70 functions as the estimation device 30.
  • the CPU 61 that executes the program is hardware.
  • each of the machine learning program 50 and the estimation program 70 can also be realized by, for example, a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit) or the like.
  • a semiconductor integrated circuit more specifically, an ASIC (Application Specific Integrated Circuit) or the like.
  • the machine learning device 10 executes the machine learning process shown in FIG. Then, the machine learning model 20 machine-learned by executing the machine learning process is output from the machine learning device 10.
  • the estimation device 30 acquires the machine learning model 20 output from the machine learning device 10 and stores it in a predetermined storage area, and the estimation target case data and the ontology are input to the estimation device 30, the estimation device 30 Will execute the estimation process shown in FIG.
  • the machine learning process is an example of the machine learning method of the disclosed technique
  • the estimation process is an example of the estimation method of the disclosed technique.
  • each of the machine learning process and the estimation process will be described in detail.
  • step S10 the graph generation unit 12 generates each value of each item of the machine learning case data as a node. Then, the graph generation unit 12 generates case graph data by connecting an edge from each "ID" node to a node indicating each of the attributes, medicines, and diseases of the patient indicated by the ID.
  • step S12 the graph generation unit 12 searches for a node matching the node indicating “drug” and “disease” included in the case graph data from the drug ontology and the disease ontology, and the searched node and its node. Extract the part that connects to. Then, the graph generation unit 12 connects the portion extracted from the ontology to the case graph data so as to superimpose the nodes indicating the matching "pharmaceutical products" or "diseases", and generates the whole graph data.
  • step S14 the embedded vector calculation unit 14 arranges each of the nodes and edges included in the overall graph data in an n-dimensional vector space as an initial value vector. Then, the embedded vector calculation unit 14 calculates the embedded vector of each node included in the overall graph data by optimizing the arrangement of each vector so as to express the connection relationship of the nodes. Therefore, the embedded vector of each node of the case graph data and the embedded vector of each node of the ontology are calculated.
  • step S16 the training data generation unit 16 concatenates the vector values of the embedded vectors calculated for each node connected to the node for each node of the "ID" included in the overall graph data, and determines the identity. Generate. Then, the training data generation unit 16 generates a correct label for the target side effect based on the side effect information, and adds the training data to the identity to generate the training data.
  • step S18 the machine learning unit 18 uses the training data generated in step S16 to update the parameters of the machine learning model 20 so as to minimize the value LOSS of the loss function described above.
  • the machine learning unit 18 determines that the value LOSS of the loss function is minimized, the machine learning is terminated, and the machine learning model 20 including the network configuration information and the parameter values at the time when the machine learning is completed is output. Then, the machine learning process ends.
  • step S20 the graph generation unit 32 generates case graph data from the estimation target case data.
  • step S22 the graph generation unit 32 connects the ontology to the case graph data and generates the entire graph data.
  • step S24 the embedded vector calculation unit 34 calculates the embedded vector of each node of the case graph data and the ontology from the whole graph data.
  • step S26 the estimation unit 36 concatenates the vector values of the embedded vectors calculated for each node connected to each node of the "ID" included in the overall graph data to generate an identity. ..
  • step S28 the estimation unit 36 inputs the characteristics generated in step S26 into the machine learning model 20 to obtain an estimation result indicating whether or not a side effect of the target occurs for the estimation target case data. It is output and the estimation process is completed.
  • the machine learning device inputs training data including an embedding vector of case graph data, an ontology embedding vector, and a correct answer label. Then, the machine learning device executes machine learning of the machine learning model based on the loss function.
  • the value of the loss function is obtained by combining the value of the activation function calculated only by the embedded vector of the case graph data and the value of the activation function calculated only by the embedded vector of the ontology among the input training data. It is calculated by the obtained value and the correct answer label.
  • the machine learning device according to the first embodiment can train the machine learning model in which the information of the case data and the information of the ontology are grouped and transmitted. Therefore, the machine learning device according to the first embodiment can appropriately reflect the information of the ontology and train the machine learning model so as to improve the estimation accuracy of the event.
  • the estimation device estimates using the machine learning model machine-learned as described above and the embedded vector calculated from the estimation target case graph data and the ontology. Estimate the event for the target case. This improves the estimation accuracy of the event.
  • the machine learning system includes a machine learning device 210 and an estimation device 230.
  • the machine learning device 210 will be described. Functionally, as shown in FIG. 1, the machine learning device 210 includes a graph generation unit 12, an embedded vector calculation unit 214, a training data generation unit 16, and a machine learning unit 18.
  • the embedded vector calculation unit 214 first calculates the embedded vector for the node of the ontology in the whole graph data in which the ontology is connected to the case graph data. For example, the embedded vector calculation unit 214 calculates the embedded vector of the node of the drug ontology (the node shown by the solid line in FIG. 15) as shown in FIG. Further, as shown in FIG. 16, the embedded vector calculation unit 214 calculates the embedded vector of the disease ontology node (the node shown by the solid line in FIG. 16). Then, as shown in FIG. 17, the embedded vector calculation unit 214 uses the embedded vector of the ontology node as an initial value (broken line portion in FIG. 17), and the node of the case graph data (node shown by the solid line in FIG. 16). Calculate the embedded vector of.
  • the embedded vector of the ontology accurately reflects the meaning of the connection between the nodes. Since the embedded vector can be calculated more accurately as the initial value is appropriately given, the embedded vector of the case graph data can be calculated accurately by using the embedded vector of the ontology as the initial value.
  • the estimation device 230 includes a graph generation unit 32, an embedded vector calculation unit 234, and an estimation unit 36, as shown in FIG. Further, the machine learning model 20 output from the machine learning device 210 is stored in the predetermined storage area of the estimation device 230.
  • the embedded vector calculation unit 234 first calculates the embedded vector of the ontology, and calculates the embedded vector of the case graph data using this as the initial value, similarly to the embedded vector calculation unit 214 of the machine learning device 210.
  • the machine learning device 210 can be realized by, for example, the computer 40 shown in FIG.
  • the storage unit 43 of the computer 40 stores a machine learning program 250 for making the computer 40 function as a machine learning device 210.
  • the machine learning program 250 has a graph generation process 52, an embedded vector calculation process 254, a training data generation process 56, and a machine learning process 58.
  • the CPU 41 reads the machine learning program 250 from the storage unit 43, expands it into the memory 42, and sequentially executes the processes of the machine learning program 250.
  • the CPU 41 operates as the embedded vector calculation unit 214 shown in FIG. 1 by executing the embedded vector calculation process 254. Other processes are the same as the machine learning program 50 according to the first embodiment.
  • the computer 40 that has executed the machine learning program 250 functions as the machine learning device 210.
  • the estimation device 230 can be realized by, for example, the computer 60 shown in FIG.
  • the storage unit 63 of the computer 60 stores an estimation program 270 for causing the computer 60 to function as the estimation device 230.
  • the estimation program 270 includes a graph generation process 72, an embedded vector calculation process 274, and an estimation process 76.
  • the storage unit 63 has an information storage area 80 in which information constituting the machine learning model 20 that has been machine-learned is stored.
  • the CPU 61 reads the estimation program 270 from the storage unit 63, expands the estimation program 270 into the memory 62, and sequentially executes the processes of the estimation program 270.
  • the CPU 61 operates as the embedded vector calculation unit 234 shown in FIG. 9 by executing the embedded vector calculation process 274. Other processes are the same as those of the estimation program 70 according to the first embodiment.
  • the computer 60 that has executed the estimation program 270 functions as the estimation device 230.
  • each of the machine learning program 250 and the estimation program 270 can also be realized by, for example, a semiconductor integrated circuit, more specifically, an ASIC or the like.
  • step S14 of the machine learning process shown in FIG. 13 and step S24 of the estimation process shown in FIG. 14 is the same as that of the first embodiment as described above. Since it is only different, the description is omitted.
  • the machine learning device first calculates the embedded vector of the ontology, and uses this as the initial value to calculate the embedded vector of the case graph data.
  • the embedded vector can be calculated with high accuracy, and the machine learning model can be trained to improve the estimation accuracy of the event. Further, in the estimation device according to the second embodiment, the estimation accuracy of the event is improved.
  • the drug predisposition and the disease predisposition may be generated from the node embedding vector common to the case graph data and the ontology. That is, in the example of FIG. 17, the identity of the case data is generated from the embedded graph of the node of the case graph data shown by the solid line, and the identity of the drug and the identity of the disease are generated from the embedded graph of the node surrounded by the broken line. You may do so.
  • the embedded vector of the case graph data is calculated with the embedded vector of the ontology as the initial value, the information of the ontology is reflected. Further, since the amount of feature information can be reduced, the load of machine learning processing and estimation processing is reduced. Further, in this case, the embedded vector of the ontology calculated without connecting the ontology to the case graph data may be given as the initial value of the embedded vector of the case graph data.
  • the ontology embedding vector in this case may be calculated for the identified ontology portion by identifying the ontology portion including the node corresponding to the node indicating the drug and disease in the case graph data.
  • the disclosed technique is applied when estimating the side effect of the administration of the drug to the patient, but the disclosed technique can also be applied to the example of estimating other events. ..
  • the case data includes information such as the chemical substance to be blended and the conditions for blending (temperature, catalyst, etc.), and information on chemical substances having similar properties such as the same melting point of substance A and substance B. Can be used as an ontology, and the event that occurred during the formulation may be used as the correct label.
  • the ontology to be used may be one type or three or more types.
  • a hidden layer of the machine learning model may be provided corresponding to each type of ontology to be used.
  • machine learning device and the estimation device are configured by separate computers
  • the machine learning device and the estimation device may be configured by one computer.
  • the mode in which the machine learning program and the estimation program are stored (installed) in the storage unit in advance has been described, but the present invention is not limited to this.
  • the program according to the disclosed technique can also be provided in a form stored in a storage medium such as a CD-ROM, a DVD-ROM, or a USB memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/JP2020/041077 2020-11-02 2020-11-02 機械学習プログラム、推定プログラム、装置、及び方法 Ceased WO2022091413A1 (ja)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/JP2020/041077 WO2022091413A1 (ja) 2020-11-02 2020-11-02 機械学習プログラム、推定プログラム、装置、及び方法
EP20959928.1A EP4239535A4 (en) 2020-11-02 2020-11-02 MACHINE LEARNING PROGRAM, INFERENCE PROGRAM, APPARATUS AND METHOD
JP2022558810A JP7444280B2 (ja) 2020-11-02 2020-11-02 機械学習プログラム、推定プログラム、装置、及び方法
US18/302,084 US20230259828A1 (en) 2020-11-02 2023-04-18 Storage medium, estimation device, and estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/041077 WO2022091413A1 (ja) 2020-11-02 2020-11-02 機械学習プログラム、推定プログラム、装置、及び方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/302,084 Continuation US20230259828A1 (en) 2020-11-02 2023-04-18 Storage medium, estimation device, and estimation method

Publications (1)

Publication Number Publication Date
WO2022091413A1 true WO2022091413A1 (ja) 2022-05-05

Family

ID=81382205

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/041077 Ceased WO2022091413A1 (ja) 2020-11-02 2020-11-02 機械学習プログラム、推定プログラム、装置、及び方法

Country Status (4)

Country Link
US (1) US20230259828A1 (https=)
EP (1) EP4239535A4 (https=)
JP (1) JP7444280B2 (https=)
WO (1) WO2022091413A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7680028B2 (ja) * 2021-11-05 2025-05-20 杭州医典智能科技有限公司 鬱病診断支援システム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201280A1 (en) * 2007-02-16 2008-08-21 Huber Martin Medical ontologies for machine learning and decision support
JP2016212853A (ja) 2015-04-30 2016-12-15 富士通株式会社 類似性計算装置、薬の類似性を計算し及び類似性を用いて副作用を推定する副作用決定装置及びシステム
US10157226B1 (en) * 2018-01-16 2018-12-18 Accenture Global Solutions Limited Predicting links in knowledge graphs using ontological knowledge

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6622236B2 (ja) * 2017-03-06 2019-12-18 株式会社日立製作所 発想支援装置及び発想支援方法
EP3382584A1 (en) * 2017-03-30 2018-10-03 Fujitsu Limited A system and a method to predict patient behaviour
JP2020047209A (ja) * 2018-09-21 2020-03-26 沖電気工業株式会社 オントロジー処理装置およびオントロジー処理プログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201280A1 (en) * 2007-02-16 2008-08-21 Huber Martin Medical ontologies for machine learning and decision support
JP2016212853A (ja) 2015-04-30 2016-12-15 富士通株式会社 類似性計算装置、薬の類似性を計算し及び類似性を用いて副作用を推定する副作用決定装置及びシステム
US10157226B1 (en) * 2018-01-16 2018-12-18 Accenture Global Solutions Limited Predicting links in knowledge graphs using ontological knowledge

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KOBAYASHI, KENJI ET AL.: "Biomedical Literature Curation System For Evidence-Based Medicine", IPSJ SIG TECHNICAL REPORTS, vol. 2, no. 2019-GN-107, JP , pages 1 - 8, XP009537320, ISSN: 2188-8744 *
See also references of EP4239535A4

Also Published As

Publication number Publication date
EP4239535A4 (en) 2023-12-20
JP7444280B2 (ja) 2024-03-06
JPWO2022091413A1 (https=) 2022-05-05
US20230259828A1 (en) 2023-08-17
EP4239535A1 (en) 2023-09-06

Similar Documents

Publication Publication Date Title
Hernadez et al. Synthetic tabular data evaluation in the health domain covering resemblance, utility, and privacy dimensions
Gartlehner et al. Data extraction for evidence synthesis using a large language model: A proof‐of‐concept study
Austin et al. Predictive performance of machine and statistical learning methods: Impact of data-generating processes on external validity in the “large N, small p” setting
Guo et al. Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning
Seo et al. Comparing methods for estimating patient‐specific treatment effects in individual patient data meta‐analysis
Salih et al. Characterizing the contribution of dependent features in XAI methods
Rafiei et al. Meta-learning in healthcare: A survey
US10902943B2 (en) Predicting interactions between drugs and foods
CN113129053A (zh) 信息推荐模型训练方法、信息推荐方法及存储介质
WO2016132588A1 (ja) データ分析装置、データ分析方法、およびデータ分析プログラム
Vo et al. Assessing the impact of case-mix heterogeneity in individual participant data meta-analysis: novel use of I 2 statistic and prediction interval
CN118230978B (zh) 一种疾病风险预测方法、系统、电子设备、介质
JP2020119101A (ja) テンソル生成プログラム、テンソル生成方法およびテンソル生成装置
CN118299070A (zh) 基于反事实预测的治疗效果估计方法、系统、设备及介质
CN111523048B (zh) 社交网络中好友的推荐方法、装置、存储介质及终端
Yang et al. Tree-based subgroup discovery using electronic health record data: heterogeneity of treatment effects for DTG-containing therapies
US11194829B2 (en) Methods and system for entity matching
JP7444280B2 (ja) 機械学習プログラム、推定プログラム、装置、及び方法
van Os et al. Machine Learning‐Based Model Selection and Averaging Outperform Single‐Model Approaches for a Priori Vancomycin Precision Dosing
JP6321845B1 (ja) 付与装置、付与方法および付与プログラム
Wang et al. Adaptive treatment strategies for chronic conditions: shared-parameter G-estimation with an application to rheumatoid arthritis
Liang et al. Deep advantage learning for optimal dynamic treatment regime
US20230401455A1 (en) Storage medium, prediction device, and prediction method
CN115376698B (zh) 用于对眼底疾病的演进进行预测的装置、方法和存储介质
JP2022007311A (ja) 学習モデルからの情報漏洩リスクを評価するシステム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20959928

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022558810

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020959928

Country of ref document: EP

Effective date: 20230602