US11568213B2 - Analyzing apparatus, analysis method and analysis program - Google Patents
Analyzing apparatus, analysis method and analysis program Download PDFInfo
- Publication number
- US11568213B2 US11568213B2 US16/595,526 US201916595526A US11568213B2 US 11568213 B2 US11568213 B2 US 11568213B2 US 201916595526 A US201916595526 A US 201916595526A US 11568213 B2 US11568213 B2 US 11568213B2
- Authority
- US
- United States
- Prior art keywords
- layer
- data
- learning parameter
- feature
- internal data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G06N3/0472—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
- G06F18/2193—Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/245—Classification techniques relating to the decision surface
- G06F18/2453—Classification techniques relating to the decision surface non-linear, e.g. polynomial classifier
-
- G06K9/6257—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G06N7/005—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present invention relates to an analyzing apparatus, an analysis method, and an analysis program used for analyzing data.
- Machine learning is one of technologies to realize artificial intelligence (AI).
- AI artificial intelligence
- machine learning calculation of learning parameters such as weight vectors in the perceptron, and the like, so as to minimize errors between predicted values obtained from feature vectors and actual values, or true values, is called learning.
- new predicted values are calculated from data not used for the learning, hereinafter, called test data.
- test data data not used for the learning
- the magnitude of each element value of a weight vector is used as the importance of a factor that contributed to prediction.
- Neural network including deep learning can realize high prediction accuracy.
- each element of a feature vector undergoes weighted product-sum operation with other elements every time they pass through a plurality of perceptrons. Accordingly, it is difficult in principle to know the importance of each element singly. This becomes a fatal drawback if deep learning is used in real businesses.
- One of indices that indicate effectiveness of a drug in a clinical trial includes lengths of time, or survival time, from the start of the clinical trial to ends of the clinical trial due to deaths of patients or censoring of the clinical trial.
- the causes are classified into a case where patients died, and a case where the clinical trial is censored for a reason such as discontinuation of administration or termination of the period of the clinical trial according to the determinations by the doctor.
- the most important amount in prediction of survival time including censoring is the function of feature amounts called a hazard function.
- the hazard function is defined about each time point, and the hazard function at the time T represents the probability of death, malfunction, or cancellation at the time T.
- the integration of the hazard function up to the time T, or cumulative hazard function gives the probability of death until the time T, and the point at which the cumulative hazard function exceeds a threshold is considered as the time point at which death occurred.
- the cumulative hazard function does not exceed a threshold at any point of time, the patient is deemed to be alive or censored. Accordingly, prediction of hazard functions is equivalent to prediction of survival time, and prevailing analysis models used in the prediction of survival time are the ones that treat hazard functions, which are easy to interpret, as targets of prediction.
- the CN-108130372-A discloses a technique of creating a prediction model of a hazard function for acute myelogenous leukemia patients, and performing analysis of factors that contribute to the prediction.
- the technique disclosed in CN-108130372-A adopts not a non-linear technique such as deep learning not having the functionality of calculating importance but a linear model taking into consideration censoring called the Cox proportional hazard model in consideration of censoring, and additionally in order to enable output of determination factors.
- Non-Patent Document 1 disclose techniques of creating nonlinear models to predict hazard functions while taking censoring into consideration. These techniques are not aimed at outputting predictors, but adopt deep learning techniques not having the functionality of outputting predictors.
- the technique disclosed in CN-106897545-A uses a unique network called Deep Belief Network, and the technique disclosed in Non-Patent Document 1 uses a unique network called DeepHit.
- Non-Patent Document 2 discloses a technique of enabling newly learning linear regression or logistic regression in which technique decision results by machine learning techniques such as deep learning which does not have the functionality of calculating the importance of feature amounts can be explained.
- logistic regression is a machine learning model equivalent to perceptron, and is used most widely in any field.
- the logistic regression illustrated in page 119 of “Friedman J, Trevor H, Robert T. The elements of statistical learning. second edition. New York: Springer series in statistics, 2001,” which is hereinafter referred to as Non-Patent Document 3 has the functionality of calculating the importance of feature amounts about entire data samples.
- Non-Patent Document 1 The techniques of CN-106897545-A and Non-Patent Document 1 can be applied to general issues where there is not linear independence mentioned above since the technique uses nonlinear models. However, deep learning models like Deep Believe Network and DeepHit cannot output factors that contribute to results of predictions made by the models.
- Non-Patent Document 2 does not present a method to be applied to input data including censored data. Furthermore, the technique of Non-Patent Document 2 is merely attempting to give explanation using linear regression in retrospect, and even when it attempts to give explanation of normal fully-connected deep learning, it is not guaranteed mathematically that the importance of feature amounts that deep learning utilizes for prediction can be completely calculated. If linear regression can achieve prediction accuracy which is completely the same as the accuracy that can be achieved with deep learning, the deep learning itself is first of all unnecessary. The technique of Non-Patent Document 2 is contradictory in terms of configuration concept.
- the present invention has been made in view of the circumstance explained above, and an object thereof is to realize facilitation of explanation of features of prediction targets for which continuity of analysis is considered.
- An analyzing apparatus disclosed in the present application is accessible to a database and includes a processor that executes a program and a storage device that stores the program.
- the database stores a training data set including pieces of training data by an amount equal to the number of learning targets, and each piece of the training data includes: first feature data having a plurality of feature amounts of a learning target; a response variable indicating analysis time from a start of analysis to an end of the analysis about the learning target; and a variable indicating continuity of the analysis within the analysis time.
- the processor executes: a first generation process of generating first internal data on a basis of the first feature data and a first learning parameter; a first conversion process of converting a position of the first feature data in a feature space on a basis of the first internal data generated in the first generation process and a second learning parameter; a reallocation process of, based on a result of first conversion in the first conversion process and the first feature data, reallocating the first feature data to a position obtained through the conversion in the feature space; a first calculation process of calculating a first predicted value of a hazard function about the analysis time in a case where the first feature data is given, based on a result of reallocation in the reallocation process and a third learning parameter; an optimization process of optimizing the first learning parameter, the second learning parameter and the third learning parameter by a statistical gradient method on a basis of the response variable and the first predicted value calculated in the first calculation process; a second generation process of generating second internal data on a basis of second feature data including a plurality of feature amounts of
- FIG. 1 is an explanatory diagram illustrating the relationship between feature vector and classification boundary
- FIG. 2 A is a block diagram illustrating a hardware configuration example of the analysis system
- FIG. 2 B is a block diagram illustrating a functional configuration example of the analysis system
- FIG. 3 is an explanatory diagram illustrating a structural example of a neural network according to a first embodiment
- FIG. 4 is a flowchart illustrating an example of a learning and prediction process procedure performed by an analyzing apparatus
- FIG. 5 is an explanatory diagram illustrating a neural network setting screen example
- FIG. 6 is an explanatory diagram illustrating a display example of an output panel
- FIG. 7 is an explanatory diagram illustrating another structural example of a neural network.
- FIG. 8 is a table illustrating experimental results.
- the apparatus explained as an example in a first embodiment predicts a hazard function for three time classes of: zero month, or shorter than one month; one month, or equal to or longer than one month and shorter than two months; and two months, or equal to or longer than two months and shorter than three months, about colon cancer patients administered with an anti-cancer drug oxaliplatin, in a clinical trial of oxaliplatin, and also outputs factors contributing to the prediction.
- Factors output by an analyzing apparatus, based on the first embodiment, that analyzes data including censored data allow a pharmaceutical company who markets oxaliplatin to appropriately set the scope of application of oxaliplatin, and also give a good clue for clarifying the action mechanism of oxaliplatin. This contributes to the improvement in the quality of medical care, and also contributes significantly to the advancement of pharmaceutics and medical science.
- FIG. 1 is an explanatory diagram illustrating the relationship between feature vector and classification boundary.
- FIG. 1 illustrates patients 101 , 102 , 105 , and 106 in a feature space 10 over which dimensions representing features of patients, e.g., daily blood pressure, extend.
- a boundary surface 100 is a true classification boundary surface that separates the patients 101 who cannot survive for one month or longer and the patients 102 who can survive one month or longer.
- a boundary surface 104 is a true classification boundary surface that separates the patients 105 who cannot survive for two months or longer and the patients 106 who can survive two months or longer.
- the boundary surfaces 100 and 104 typically have too complicated curved surfaces for human abilities to understand. On the other hand, in some cases, even complicated curved surfaces like the boundary surfaces 100 and 104 can be seen as a plane 103 locally.
- An analyzing apparatus uses deep learning capable of processing data including censored data to generate a linear model for each patient.
- FIGS. 2 A and 2 B are block diagrams illustrating a system configuration example of an analysis system. Although a server-client type analysis system 2 is explained as an example in FIGS. 2 A and 2 B , the analysis system may be a stand-alone type system.
- FIG. 2 A is a block diagram illustrating a hardware configuration example of the analysis system 2
- FIG. 2 B is a block diagram illustrating a functional configuration example of the analysis system 2 . The same configurations are given the same signs in FIGS. 2 A and 2 B .
- the client terminal 200 has a hard disk drive (HDD) 201 which is an auxiliary storage apparatus, a memory 202 which is a main storage apparatus, a processor 203 , an input apparatus 204 which is a keyboard or a mouse, and a monitor 205 .
- HDD hard disk drive
- the analyzing apparatus 220 has an HDD 221 which is an auxiliary storage apparatus, a memory 222 which is a main storage apparatus, a processor 223 , an input apparatus 224 which is a keyboard or a mouse, and a monitor 225 .
- a storage device stores a neural network 300 , see FIG. 3 , and learning parameters thereof.
- the client terminal 200 has a client database (DB) 251 .
- the client DB 251 is stored in the storage device such as the HDD 201 or the memory 202 .
- the client DB 251 stores a test data set 252 , and a prediction result 253 .
- the test data set 252 is a set of test data.
- the prediction result 253 is data obtained from the predicting unit 262 through the network 210 . Note that in the case of a server-client type system, there exist one or more client terminals 200 .
- the analyzing apparatus 220 has a learning unit 261 , a predicting unit 262 , and a server database (DB) 263 .
- the learning unit 261 is a functional unit that outputs learning parameters 265 by using the neural network 300 .
- the predicting unit 262 is a functional unit that: constructs the neural network 300 by using the learning parameters 265 ; executes a prediction process when having received test data input to the neural network 300 ; and outputs the prediction result 253 to the client terminal 200 .
- the learning unit 261 and predicting unit 262 realize their functionalities by causing programs stored in storage devices such as the HDD 221 and the memory 222 to be executed by the processor 223
- the server DB 263 stores a training data set 264 and the learning parameters 265 .
- the training data set 264 is a set of training data constituted by combinations ⁇ x (n) , Y (n) , e (n) ⁇ of feature vectors x (n) , response variables Y (n) which are true values thereof, and binary variables e (n) representing whether data is censored data or non-censored data.
- n ⁇ 1, 2, . . . , N ⁇
- a feature vector x (n) ⁇ R D is a D-dimensional real number and D is an integer equal to or larger than one, is a D-dimensional real value vector, and includes information about the age, gender, medication, test values and the like related to a patient of n-th patient data.
- the test data set 252 is a set of test data which is other feature vectors not used as feature vectors x (n) . Those other feature vectors to be test data is denoted as feature vectors x′ (n) .
- a response variable Y (n) indicates survival time mentioned above, that is, analysis time from the start of analysis to the end of analysis about a learning target.
- the response variable Y (n) is a time class indicating class value i that is any one of “0,” “1” or “2,” for n-th patient data.
- the response variable Y (n) is referred to as a time class Y (n) in some cases.
- the magnitude relationship of class values i is set so as to correspond to the magnitude relationship of survival time on which the time classes i are based.
- a binary variable e (n) indicates the continuity of analysis, a clinical trial in this example, within analysis time, the survival time in this example, from the start of analysis to the end of analysis, about the feature vector x (n) which is a learning target.
- the value of the binary variable e (n) is “0” if a patient of an n-th patient data n is a censored patient
- the value of the binary variable e (n) is “1” if the patient is a non-censored patient.
- the “censored” patient is a patient who is alive at the end of the clinical trial, that is, after a lapse of the survival time.
- the “non-censored” patient is a patient who died resulting in the end of a clinical trial, before the end of the clinical trial, that is, at a time until which the survival time has elapsed after the start of the clinical trial.
- the learning parameters 265 are output data from the learning unit 261 , and include learning parameters ⁇ W h , W, w ⁇ mentioned below.
- the neural network 300 in which the learning parameters 265 are set is referred to as a prediction model.
- initial values of the learning parameters ⁇ W h , W, w ⁇ are determined randomly.
- the analyzing apparatus 220 may be constituted by a plurality of analyzing apparatuses. For example, there may be a plurality of analyzing apparatuses 220 for load balancing. In addition, the analyzing apparatus 220 may be constituted by a plurality of analyzing apparatuses, each analyzing apparatus being responsible for a certain functionality. For example, the analyzing apparatus 220 may be constituted by a first server including the learning unit 261 and server DB 263 , and a second server including the predicting unit 262 and server DB 263 .
- the analyzing apparatus 220 may be constituted by a first analyzing apparatus including the learning unit 261 and predicting unit 262 , and a second analyzing apparatus including the server DB 263 .
- the analyzing apparatus 220 may be constituted by a first server including the learning unit 261 , a second analyzing apparatus including the predicting unit 262 , and a third analyzing apparatus including the server DB 263 .
- FIG. 3 is an explanatory diagram illustrating a structural example of the neural network 300 according to the first embodiment.
- the neural network 300 is used in the learning unit 261 and predicting unit 262 .
- the neural network 300 has a neuron group 302 , a transform unit group 303 , a reallocation unit 304 , a decision unit 305 , and an importance unit 306 .
- the feature vector x (n) to be input data is illustrated as an input neuron 301 .
- the neuron group 302 is a set of neurons 302 ( 1 ) to 302 (L) of L layers and L is an integer equal to or larger than one.
- a neuron 302 ( k ) receives output data from a neuron 302 ( k ⁇ 1), which is an adjacent higher layer. Note that the neuron 302 ( 1 ) receives an input of the feature vector x (n) .
- the neuron 302 ( k ) calculates an internal vector h (k) based on the feature vector x (n) and a learning parameter W hk ⁇ R D ⁇ D′ , where D′ is an integer equal to or larger than one, as illustrated in the following Formula (1).
- D′ is an integer equal to or larger than one
- the index n is omitted in order to facilitate explanation.
- ⁇ right arrow over (h) ⁇ (k) ⁇ ( W hk ⁇ right arrow over (x) ⁇ ) (1) where ⁇ right arrow over (x) ⁇ is the feature vector x.
- the activation function ⁇ is a sigmoid function, for example.
- the activation function ⁇ may be a function such as tank, softplus, or Relu.
- the type “Activation” and number of layers “Inner layers” of the activation function of the neuron group 302 , and the number of dimensions D′ “Number of neurons” of the internal vector h (k) can be set freely, see FIG. 5 .
- the neuron 302 ( k ) receives an output from the neuron 302 ( k ⁇ 1) of the layer (k ⁇ 1), which is an adjacent higher layer, executes the above-mentioned Formula (1), and outputs a result of the calculation to the layer (k+1), which is an adjacent lower layer.
- the neuron 302 ( 1 ) which is in the first layer receives the feature vector x (n) , executes the above-mentioned Formula (1), and outputs a result of the calculation to the second layer, which is an adjacent lower layer.
- the neuron group 302 can execute the prediction similar to that executed at the time of learning, by using a learning parameter W h generated based on the above-mentioned Formula (1), the following Formula (5), and the following Formula (6).
- each piece of information used at the time of prediction is given a single quotation mark “′” similar to a feature vector x′ (n) .
- the neuron 302 ( k ) receives output data from the neuron 302 ( k ⁇ 1), which is the adjacent higher layer.
- the neuron 302 ( 1 ) receives an input of the feature vector x′ (n) , which is the test data set 252 .
- the transform unit group 303 includes a set of L transform units 303 ( 1 ) to 303 (L). At the time of learning by the learning unit 261 , the transform units 303 ( 1 ) to 303 (L) receives an input of the internal vector h (k) of the same layer, and uses the learning parameter W k ⁇ R D ⁇ D′ to calculate a transform vector v (k) ⁇ ⁇ R D for each layer by using the following Formula (2).
- v (k) ⁇ W k ⁇ ⁇ h ⁇ (2)
- the above-mentioned Formula (2) uses the Einstein summation convention.
- Z ⁇ X ⁇ 3 ⁇ Y ⁇
- X is a matrix consisting of ⁇ rows and ⁇ columns
- Y is a matrix consisting of ⁇ rows and one column
- Z is a matrix, or vector, consisting of ⁇ rows and one column.
- the Einstein summation convention is used for formulae for explaining operation.
- ⁇ and ⁇ are omitted in some cases.
- Each transform vector v (k) is input to a transform unit 303 e downstream.
- the transform unit 303 e averages transform vectors v (k) ⁇ , and outputs a transform vector V ⁇ .
- the transform vector V ⁇ is output to the reallocation unit 304 downstream.
- the transform unit group 303 receives an input of the internal vector h′ (k) of the same layer, and uses the learning parameter W k to calculate a transform vector v′ (k) ⁇ for each layer.
- the transform unit 303 e averages transform vectors v′ (k) ⁇ , and outputs a transform vector V′ ⁇ .
- the reallocation unit 304 receives an input of the feature vector x′ (n) and the averaged transform vector V′, and calculates the reallocation vector r′ ⁇ ⁇ R D by using the above-mentioned Formula (2), the above-mentioned Formula (3), and the learning parameter W generated based on the following Formula (5) and the following Formula (6).
- the decision unit 305 receives an input of the reallocation vector r ⁇ , and calculates a predicted value p (n) corresponding to a response variable Y (n) by using the following Formula (4).
- p i softmax( w ⁇ i r ⁇ ) (4)
- softmax is the softmax function
- w i ⁇ R D ⁇ I is the learning parameter of a class value i
- the learning unit 261 uses a statistical gradient method, and receives an input of a combination of a response variable Y (n) and a binary variable e (n) representing whether data is censored data or non-censored data, and a probability value p (n) i to calculate the learning parameters 265 , ⁇ W h , W, w ⁇ , so as to minimize the DeepHit loss function, see Non-Patent Document 1, illustrated in the following Formula (5) and the following Formula (6).
- the predicted value F (n) i of the probability defined by the above-mentioned Formula (6) indicates the probability that a patient identified by patient data n dies before completion of the time class i of the patient data n.
- the third term specifies the magnitude relationship of the time classes Y.
- the learning unit 261 stores, in the server DB 263 , the learning parameters 265 , ⁇ W h , W, w ⁇ , generated based on the above-mentioned Formula (5), and the above-mentioned Formula (6).
- the importance unit 306 gives the test data feature vector x′(n) to the neural network 300 in which the learning parameters 265 , ⁇ W h , W, w ⁇ , are reflected to calculate an importance vector ⁇ .
- the decision unit 305 uses the following Formula (8) to calculate the predicted value p′ i (n) .
- p′ (n) i softmax( ⁇ ⁇ i ( ⁇ right arrow over (x) ⁇ ′ (n) ) x′ (n) ⁇ (8)
- the importance vector ⁇ ⁇ i (x′ (n) ) on the right-hand side corresponds to a local plane parameter for classifying the test data feature vector x′ (n) as being in the time class of the class value i.
- the importance vector ⁇ ⁇ i (x′ (n) ) corresponds to the parameter of the local plane 103 .
- FIG. 4 is a flowchart illustrating an example of a learning and prediction process procedure performed by an analyzing apparatus.
- Steps S 401 and S 402 correspond to the phase of learning executed by the learning unit 261
- Steps S 403 to S 407 correspond to the phase of prediction executed by the predicting unit 262 .
- the learning unit 261 reads out the training data set 264 from the server DB 263 in Step S 401 , and executes a learning parameter generation process in Step S 402 .
- the learning unit 261 gives, to the neural network 300 , the feature vector x (n) which is part of the training data set 264 to thereby calculate the internal vector h (k) based on the above-mentioned Formula (1) in Step S 421 .
- the learning unit 261 calculates a transform vector v ⁇ (k) for each layer k based on the above-mentioned Formula (2), and calculates in Step S 422 the transform vector V ⁇ by averaging them.
- the learning unit 261 calculates the reallocation vector r ⁇ based on the above-mentioned Formula (3) in Step S 423 .
- Step S 424 the learning unit 261 calculates, for each class value i, the probability of death about the time class i, that is, the predicted value p i of a hazard function, based on the above-mentioned Formula (4).
- the learning unit 261 gives, to the above-mentioned Formula (5) and the above-mentioned Formula (6), the predicted value p (n) i calculated based on the above-mentioned Formula (4) and the response variable Y (n) which is part of the training data set 264 to thereby optimize the learning parameters 265 , ⁇ W h , W, w ⁇ , in Step S 425 .
- the optimized learning parameters 265 , ⁇ W h , W, w ⁇ are generated.
- the learning unit 261 stores the generated learning parameters 265 , ⁇ W h , W, w ⁇ , in the server DB 263 , in Step S 426 .
- the predicting unit 262 reads out, from the client DB 251 , the feature vector x′ (n) , which is the test data set 252 , in Step S 403 .
- the predicting unit 262 calculates the importance of the feature amount in Step S 404 . Specifically, for example, by using the neuron group 302 , the predicting unit 262 gives, to the above-mentioned Formula (1), the feature vector x′ (n) and the optimized learning parameter W h to generate the internal vector h′ (k) .
- the predicting unit 262 gives, to the above-mentioned Formula (2), the internal vector h′ (k) and the optimized learning parameter W k to generate the transform vector v′ (k) , and averages the generated transform vectors v′ (1) to v′ (L) to generate the transform vector V′ ⁇ . Then, by using the importance unit 306 , the predicting unit 262 gives, to the above-mentioned Formula (7), the optimized learning parameter w ⁇ i and the transform vector V′ ⁇ to calculate the importance vector ⁇ ⁇ i (x′ (n) ) of the feature vector x′.
- the predicting unit 262 gives, to the above-mentioned Formula (8), the feature vector x′ (n) and the importance vector ⁇ ⁇ i (x′ (n) ) determined based on the above-mentioned Formula (8) to calculate the predicted value p′ i (n) of the hazard function for each class value i, in Step S 405 .
- the predicting unit 262 stores, in the client DB 251 and as a prediction result 253 , a combination of the calculated predicted value p′ i (n) of the hazard function and the importance vector ⁇ ⁇ i (x′ (n) ) in Step S 406 . Thereafter, the client terminal 200 displays the prediction result 253 on the monitor 225 in Step S 407 .
- the analyzing apparatus 220 in the first embodiment can highly accurately and efficiently realize facilitation of explanation of the predicted value p′ i (n) .
- the analyzing apparatus 220 may store the prediction result 253 in the server DB 263 .
- the analyzing apparatus 220 may transmit the prediction result 253 to the client terminal 200 to allow the client terminal 200 to display the prediction result 253 on the monitor 225 .
- FIG. 5 is an explanatory diagram illustrating a neural network setting screen example.
- a neural network setting screen 500 can be displayed on the monitors 205 and 225 . If the setting screen 500 is displayed on the monitor 205 , a neural network can be set in the client terminal 200 , and if the setting screen 500 is displayed on the monitor 225 , a neural network can be set in the analyzing apparatus 220 .
- a user edits detailed settings of a neural network on an attribute panel 501 .
- “Inner Layer Number” on the attribute panel 501 corresponds to the number of layers L of the neuron group 302 .
- the number of layers of the neuron group 302 is L.
- “Number of neurons” on the attribute panel 501 corresponds to the number of dimensions D′ the internal vector h (k) .
- the training data set 264 is set in the server DB 263
- the test data set 252 is set in the client DB 251 .
- An output panel 504 displays the prediction result 253 of the prediction process illustrated in FIG. 4 .
- FIG. 6 is an explanatory diagram illustrating a display example of the output panel 504 .
- the display screen 600 displays the prediction result 253 on the output panel 504 .
- “Probability” “57%” is the predicted value p′i (n) .
- the percentages of the feature amounts x 1 to x 9 are numerical values representing the values of the importance vector ⁇ ⁇ i (x′ (n) ) as normalized percentage values
- the neuron group 302 may be branched at an intermediate middle layer k.
- the neuron group from the neuron 302 ( 1 ) in the first layer to the neuron 302 ( k ) in the middle layer k is referred to as the first neuron group.
- the neuron group from the neuron 302 ( k +1) in the layer (k+1), which is one layer lower than the middle layer k, to the neuron 302 (L) of the lowermost layer L is referred to as latter neuron group.
- the number of branches in the latter neuron group is equal to the number of analysis targets.
- the numbers of response variables Y (n) , and binary variables e (n) are also equal to the number of analysis targets.
- the number of the feature vector x (n) is independent of the number of analysis targets, and is one.
- the neural network 300 including branches can predict survival time according to multiple types of factor in death, or feature amounts, related to cancer-related deaths as analysis targets corresponding to one of the destinations of the branch, and can predict survival time according to multiple types of factor in death, or feature amounts, related to non-cancer-related deaths as analysis targets corresponding to the other of the destinations of the branch.
- FIG. 7 is an explanatory diagram illustrating another structural example of the neural network 300 .
- the number of layers L 4.
- A is given at the ends of signs of constituent elements related to one of the destinations of the branch, and B is given at the ends of signs of constituent elements related to the other of the destinations of the branch.
- Neurons with signs which are different only at theirs ends, A and B have the same functions, but have learning parameters with different values.
- neurons 302 ( 3 )A and 302 ( 3 )B calculate the internal vector h (3) based on the above-mentioned Formula (1), but their learning parameters W h3 are different with each other. Note that although each of neurons 302 ( 1 ) and 302 ( 2 ) before the branch is illustrated every destination of the branch for facilitation of explanation in FIG. 7 , each of those neurons needs not be provided for every destination of the branch.
- the neurons 302 ( 1 ) and 302 ( 2 ), and transform units 303 ( 1 ) and 303 ( 2 ), learning and prediction corresponding to the number of branches can be performed by using one feature vector. Note that although the number of branches is two in FIG. 7 , it may be equal to or larger than three.
- the analyzing apparatus 220 in the first embodiment can predict survival time of breast cancer patients using, as feature vectors, the molecular taxonomy of breast cancer international consortium (METABRIC) data of breast cancer patients.
- METABRIC breast cancer international consortium
- the METABRIC data is data set which is created for performing sub-group classification of breast cancer by the METABRIC, and consists of information indicating gene expression information, clinical features, survival time, and whether censored or not about 1,980 breast cancer patients.
- this gene expression information of the METABRIC data only gene expression information obtained by using genetic markers MKI67, EGFR, PGR, and ERBB2 typically used for selecting treatment methods for breast cancer patients is used.
- Factors identified by the importance output by the analyzing apparatus 220 based on the first embodiment allow doctors to give prognosis instructions appropriate for individual breast cancer patients. This contributes to the improvement in quality of medical care, and also leads to reduction in national medical expenditure and health expenditure.
- response variables Y (n) are set to amounts of survival time measured in the unit of month.
- FIG. 8 is a table illustrating experimental results. Specifically, for example, a table 800 in FIG. 8 illustrates experimental results obtained by performing comparison between a classifier based on the Cox proportional hazard model, a classifier based on DeepHit, see Non-Patent Document 3, and the analyzing apparatus 220 according to the first embodiment by using 10-fold cross validation on the scale of concordance index (C-index).
- C-index concordance index
- the C-index values are 0.63 for the Cox proportional hazard model, 0.64 for DeepHit, and 0.66 for the analyzing apparatus 220 , Proposed, according to the first embodiment.
- the analyzing apparatus 220 according to the first embodiment achieved performance better than those of the conventional methods.
- uses of the analyzing apparatus 220 is not limited to the medical field, but for example the analyzing apparatus 220 can be applied to video distribution services.
- the operator of a video distribution service can know factors that are likely to lead to cancellation during contact periods, and can attempt to improve services.
- a second embodiment illustrates an example in which the Cox regression model is applied to the analyzing apparatus 220 .
- the analyzing apparatus 220 explained as an example predicts a hazard function of a press machine at a factory, and also outputs factors that contribute to the prediction.
- Predicted values output by the analyzing apparatus 220 according to the second embodiment make it possible to take preventive measures such as maintenance before a malfunction of the press machine occurs, and make it possible to prevent costs required for replacement of the press machine, and losses due to a stop of the operation of the press machine during the replacement. Furthermore, factors identified by the importance output by the analyzing apparatus 220 according to the second embodiment make it possible to take preventive measures before malfunctions efficiently and properly. This enables reduction in asset maintenance costs in the manufacturing industry, and efficient operation of facilities, and contributes to performance improvement in the manufacturing industry.
- Training data is sample data constituted by, for example, combinations (x (n) , T (n) , e (n) ) of feature vectors x (n) , response variables T (n) , and binary variables e (n) with values of 0 corresponding to censored samples, and 1 corresponding to non-censored samples.
- n ⁇ 1, . . . , N ⁇ are indices for specifying certain sample data.
- a feature vector x (n) ⁇ R D is a D-dimensional real value vector, and includes specification information indicating materials, manufacture date and the like of a machine, and sensor information indicating voltage, vibration, temperature and the like.
- a response variable T (n) is survival time of a press machine.
- w ⁇ ⁇ R D is a learning parameter
- h cox is a predicted value of a hazard function in the Cox regression model.
- the exponential regression model, Weibull regression model, or logarithmic logistic regression model may be used instead of the Cox regression model represented by Formula (9).
- d n is the number of persons whose survival time is T (n)
- D(T (n) ) is a set of samples whose survival time is T (n)
- R(T (n) ) is a set of samples whose survival time is equal to or longer than (T (n) ).
- the Cox partial likelihood function formula based on the Breslow method or Exact method can be used instead of Formula (11).
- the logarithmic likelihood function of the model can be used.
- the present invention is not limited to the embodiments mentioned above, but include various variants and equivalent configurations within the gist of the attached Claims.
- the embodiments mentioned above are explained in detail in order to explain the present invention in an easy-to-understand manner, and the present invention is not necessarily limited to those including all the explained configurations.
- some of configurations of an embodiment may be replaced with configurations of another embodiment.
- configurations of an embodiment may be added to configurations of another embodiment.
- some of configurations of each embodiment may be subjected to addition of other configuration, removal, or replacement with other configurations.
- each configuration, functionality, processing unit, processing means or the like mentioned above may be realized by hardware by, for example, partially or entirely designing it with an integrated circuit, or may be realized by software by a processor interpreting and executing a program for realizing the functionality of it.
- Information in a program, a table, a file or the like that realizes each functionality can be stored on a storage apparatus such as a memory, a hard disk or an solid state drive (SSD), or a recoding medium which is an integrated circuit (IC) card, an SD card or a digital versatile disc (DVD).
- a storage apparatus such as a memory, a hard disk or an solid state drive (SSD), or a recoding medium which is an integrated circuit (IC) card, an SD card or a digital versatile disc (DVD).
- control lines or information line are those that are deemed to be necessary for explanation, and all control lines or information lines that are necessary for implementation are not necessarily illustrated. Actually, almost all the configurations may be deemed to be connected to each other.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Nonlinear Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
[Formula 1]
{right arrow over (h)} (k)=σ(W hk {right arrow over (x)}) (1)
where {right arrow over (x)} is the feature vector x.
[Formula 2]
v (k) ∝ =W kβ ∝ h β (2)
[Formula 3]
r ∝ =V ∝ ⊙x ∝ (3)
The operator ⊙ is the Hadamard product.
[Formula 4]
p i=softmax(w ∝ i r ∝) (4)
where (A) is an indicator function that gives 1 if it satisfies the conditional expression represented by A, and gives 0 if not.
F (n) i=Σj=0 i p (n) j (6)
[Formula 6]
ξα i({right arrow over (x)}′)=w ∝ i ⊙V′ α (7)
[Formula 7]
p′ (n) i=softmax(ξα i({right arrow over (x)}′ (n))x′ (n) α (8)
[Formula 8]
h Cox=exp(w ∝ r ∝) (9)
Claims (11)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JPJP2018-202952 | 2018-10-29 | ||
JP2018-202952 | 2018-10-29 | ||
JP2018202952A JP7059162B2 (en) | 2018-10-29 | 2018-10-29 | Analytical instruments, analytical methods, and analytical programs |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200134430A1 US20200134430A1 (en) | 2020-04-30 |
US11568213B2 true US11568213B2 (en) | 2023-01-31 |
Family
ID=70326921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/595,526 Active 2041-08-19 US11568213B2 (en) | 2018-10-29 | 2019-10-08 | Analyzing apparatus, analysis method and analysis program |
Country Status (2)
Country | Link |
---|---|
US (1) | US11568213B2 (en) |
JP (1) | JP7059162B2 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111312393B (en) * | 2020-01-14 | 2022-02-22 | 之江实验室 | Time sequence deep survival analysis system combined with active learning |
KR20230084523A (en) | 2020-10-07 | 2023-06-13 | 고쿠리츠다이가쿠호진 니이가타 다이가쿠 | Software providing device, software providing method and program |
US11480956B2 (en) * | 2020-10-15 | 2022-10-25 | Falkonry Inc. | Computing an explainable event horizon estimate |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140335126A1 (en) * | 2011-12-05 | 2014-11-13 | Duke University | V1v2 immunogens |
CN106897545A (en) | 2017-01-05 | 2017-06-27 | 浙江大学 | A kind of tumor prognosis forecasting system based on depth confidence network |
CN108130372A (en) | 2018-01-17 | 2018-06-08 | 华中科技大学鄂州工业技术研究院 | A kind of method and device for the instruction of acute myeloid leukemia drug |
US20190065991A1 (en) * | 2017-08-31 | 2019-02-28 | Accenture Global Solutions Limited | Machine learning document processing |
US20190244094A1 (en) * | 2018-02-06 | 2019-08-08 | Sap Se | Machine learning driven data management |
US20190316209A1 (en) * | 2018-04-13 | 2019-10-17 | Grail, Inc. | Multi-Assay Prediction Model for Cancer Detection |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8078554B2 (en) * | 2008-09-03 | 2011-12-13 | Siemens Medical Solutions Usa, Inc. | Knowledge-based interpretable predictive model for survival analysis |
JP6646552B2 (en) * | 2016-09-13 | 2020-02-14 | 株式会社日立ハイテクノロジーズ | Image diagnosis support apparatus, image diagnosis support method, and sample analysis system |
JP7059151B2 (en) * | 2018-09-12 | 2022-04-25 | 株式会社日立製作所 | Time series data analyzer, time series data analysis method, and time series data analysis program |
-
2018
- 2018-10-29 JP JP2018202952A patent/JP7059162B2/en active Active
-
2019
- 2019-10-08 US US16/595,526 patent/US11568213B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140335126A1 (en) * | 2011-12-05 | 2014-11-13 | Duke University | V1v2 immunogens |
CN106897545A (en) | 2017-01-05 | 2017-06-27 | 浙江大学 | A kind of tumor prognosis forecasting system based on depth confidence network |
US20190065991A1 (en) * | 2017-08-31 | 2019-02-28 | Accenture Global Solutions Limited | Machine learning document processing |
CN108130372A (en) | 2018-01-17 | 2018-06-08 | 华中科技大学鄂州工业技术研究院 | A kind of method and device for the instruction of acute myeloid leukemia drug |
US20190244094A1 (en) * | 2018-02-06 | 2019-08-08 | Sap Se | Machine learning driven data management |
US20190316209A1 (en) * | 2018-04-13 | 2019-10-17 | Grail, Inc. | Multi-Assay Prediction Model for Cancer Detection |
Non-Patent Citations (6)
Title |
---|
Changhee Lee, et al., "DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks", The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018, pp. 2314-2321. |
Che et al., "Interpretable Deep Models for ICU Outcome Prediction", AMIA annual symposium proceedings, vol. 2016, pp. 371-380 (Year: 2016). * |
Lee et al., "DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks", The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), Published Apr. 26, 2018 (Year: 2018). * |
Marco Tulio Ribeiro, et al., "Why Should I Trust You?: Explaining the Predictions of Any Classifier", Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. |
Su et al., "Long-term Blood Pressure Prediction with Deep Recurrent Neural Networks", 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), Mar. 4-7, 2018 (Year: 2018). * |
Trevor Hastie, et al., "The Elements of Statistical Learning", Second edition. New York: Springer series in statistics, 2001. |
Also Published As
Publication number | Publication date |
---|---|
JP7059162B2 (en) | 2022-04-25 |
JP2020071517A (en) | 2020-05-07 |
US20200134430A1 (en) | 2020-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jamei et al. | Predicting all-cause risk of 30-day hospital readmission using artificial neural networks | |
Jung et al. | A framework for making predictive models useful in practice | |
van Assen et al. | Artificial intelligence from A to Z: from neural network to legal framework | |
Yu et al. | Predicting readmission risk with institution-specific prediction models | |
AU2012245343B2 (en) | Predictive modeling | |
US11568213B2 (en) | Analyzing apparatus, analysis method and analysis program | |
US11437146B2 (en) | Disease development risk prediction system, disease development risk prediction method, and disease development risk prediction program | |
Funkner et al. | Data-driven modeling of clinical pathways using electronic health records | |
Tighe et al. | Use of machine learning theory to predict the need for femoral nerve block following ACL repair | |
Sahoo et al. | Potential diagnosis of COVID-19 from chest X-ray and CT findings using semi-supervised learning | |
Bing et al. | Conditional generation of medical time series for extrapolation to underrepresented populations | |
Richardson et al. | Association of race/ethnicity with mortality in patients hospitalized with COVID-19 | |
Lenatti et al. | A novel method to derive personalized minimum viable recommendations for type 2 diabetes prevention based on counterfactual explanations | |
Ball | Improving healthcare cost, quality, and access through artificial intelligence and machine learning applications | |
Brzan et al. | Contribution of temporal data to predictive performance in 30-day readmission of morbidly obese patients | |
Gopukumar et al. | Predicting readmission charges billed by hospitals: machine learning approach | |
Xing et al. | Non-imaging medical data synthesis for trustworthy AI: A comprehensive survey | |
Alsinglawi et al. | Benchmarking predictive models in electronic health records: Sepsis length of stay prediction | |
Saranya et al. | Cancer prognosis with machine learning-based modified meta-heuristics and weighted gradient boosting algorithm | |
Rahman et al. | Generalization in Healthcare AI: Evaluation of a Clinical Large Language Model | |
Jiang et al. | Concave 1-norm group selection | |
Wee et al. | Notice of Removal: Automated Triaging Medical Referral for Otorhinolaryngology Using Data Mining and Machine Learning Techniques | |
Martinez et al. | Understanding and Predicting Cognitive Improvement of Young Adults in Ischemic Stroke Rehabilitation Therapy | |
Sharifi et al. | A cluster-based machine learning model for large healthcare data analysis | |
Kaushik et al. | Disease management: clustering–based disease prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HITACHI, LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YAMASHITA, YASUHO;SHIBAHARA, TAKUMA;SUZUKI, MAYUMI;SIGNING DATES FROM 20190911 TO 20190912;REEL/FRAME:050648/0209 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |