WO2023284321A1

WO2023284321A1 - Method and device for predicting survival hazard ratio

Info

Publication number: WO2023284321A1
Application number: PCT/CN2022/081403
Authority: WO
Inventors: 乔楠; 林歆远; 徐迟
Original assignee: 华为云计算技术有限公司
Priority date: 2021-07-15
Filing date: 2022-03-17
Publication date: 2023-01-19
Also published as: CN115620902A

Abstract

The present application relates to the technical field of artificial intelligence, and discloses a method and device for predicting a survival hazard ratio (HR). The method comprises: obtaining data of a sample to be predicted; and inputting the data of said sample into a preset model, and processing the data of said sample by means of the preset model to obtain a survival HR for representing a survival hazard of said sample. The preset model comprises a gating network and a plurality of expert networks, the gating network is used for determining, according to the data of said sample, a weight coefficient corresponding to each expert network, and the survival HR is a result obtained by performing weighted summation on output values of the plurality of expert networks according to the weight coefficient corresponding to each expert network.

Description

Method and device for predicting survival risk rate

technical field

The present application relates to the technical field of artificial intelligence, and in particular to a method and device for predicting a survival risk ratio (hazard ratio, HR).

Background technique

Survival analysis refers to a series of statistical methods used to explore the time of occurrence of target events. For example, survival time analysis of cancer patients. Another example, failure time analysis of equipment, and so on.

Usually, when performing survival analysis on a target event, an analysis model can be established based on data obtained from pre-investigation or experiments, and the analysis model can be used to predict one or more characteristics based on one or more characteristic variables that affect the occurrence of the target event The effect of variables on the survival curve of the target event to achieve the survival analysis of the target event. Exemplarily, by establishing a cox proportional hazards regression model (Cox proportional hazards model, coxPH), and inputting one or more characteristic variables that affect the occurrence of the target event into the model, it is possible to predict the occurrence of the target event at different times risk. It will be appreciated that the risk of the target event occurring at different times may reflect the survival curve for the observed events. Here, the ending event of the observation event is the target event. Among them, the coxPH model can be expressed as formula (1):

Formula (1) h(t)＝h ₀ (t)×exp(b ₁ x ₁ +b ₂ x ₂ +...+b _p x _p )

Here, t is the survival time, and h(t) is the risk function of the target event, which represents the death risk of the target event when the survival time is t. h ₀ (t) represents the base risk function, which is usually determined in advance through the survival curves of a large number of samples. x ₁ , x ₂ , ... x _p represent p covariates, that is, characteristic variables that affect the target event to be predicted, and b ₁ , b ₂ , ... b _p represent the regression coefficient of each covariate. It can be seen that the coxPH model is a linear model, that is, the coxPH model can only be used to analyze data with a linear relationship between input features and learning objectives (ie, the risk of occurrence of target events).

However, in practical applications, the influence of the characteristic variables affecting the occurrence of the target event on the occurrence of the target event is often nonlinear, that is, the relationship between the characteristic variables affecting the target event and the occurrence of the target event is usually a nonlinear relationship. Therefore, the linear model coxPH cannot accurately perform survival analysis on the target event. Based on this, how to improve the accuracy of survival analysis is a technical problem to be solved urgently in the prior art.

Contents of the invention

The present application provides a method and device for predicting survival risk rate, which can improve the accuracy of survival analysis.

In order to achieve the above object, the application provides the following technical solutions:

In a first aspect, the present application provides a method for predicting a survival risk rate, the method comprising: acquiring data of a sample to be predicted. The data of the sample to be predicted is input into the preset model, and the data of the sample to be predicted is processed by the preset model to obtain the survival risk rate HR used to represent the survival risk of the sample to be predicted. Among them, the preset model includes a gating network and a plurality of expert networks, the gating network is used to determine the weight coefficient corresponding to each expert network according to the data of the sample to be predicted, and the survival risk rate output by the preset model is based on each expert The weight coefficient corresponding to the network is the result obtained by weighting and summing the output values of multiple expert networks.

Through the method provided in this application, since the preset model includes multiple expert networks and the gating network used to determine the weight coefficients of the expert networks, the preset model can output multiple expert networks according to the data of the samples to be predicted The results are integrated, therefore, the accuracy of the survival risk rate predicted by the preset model is higher, and the accuracy of the survival curve determined based on the survival risk rate is also higher. Moreover, the preset model can be trained based on an end-to-end training method.

In a possible design mode, the above method further includes: determining the risk function of the sample to be predicted based on the above survival risk rate and the baseline risk function, and the risk function is used to indicate the survival rate of the sample to be predicted at different times.

Wherein, the survival risk rate is the survival risk rate of the to-be-predicted sample predicted by the above prediction model after the to-be-predicted sample is processed. In this way, through this possible design method, the survival analysis of the sample to be predicted is realized. Since the accuracy of the survival risk rate of the sample to be predicted predicted by the method provided by this application is high, the method used to indicate the sample to be predicted is determined based on the survival risk rate of the sample to be predicted by the method provided by this application. The accuracy of the hazard function of the survival rate at different times is also relatively high.

In another possible design, any expert network among the plurality of expert networks in the above preset model includes at least one candidate residual fully connected neural network RFCN, and the output value of any expert network is at least one candidate RFCN The output value that satisfies the preset condition among the output values.

In this possible design method, by using the learning results of candidate RFCNs that meet the preset conditions among the results of multiple candidate RFCNs in an expert network as the output result of the expert network, the idea of selecting the best can be reflected, so that the predictions to be made can be improved. The prediction accuracy of the sample.

In another possible design manner, the data of the samples to be predicted include non-Euclidean data.

Among them, the non-European data is the data that is arranged irregularly and irregularly. In practical applications, the amount of non-European data is huge and the structure is complex. Usually, the relationship between the non-Euclidean data in the data of the sample to be predicted and the survival rate of the sample to be predicted is a nonlinear relationship. Through this possible design, the method provided in the embodiment of the present application can realize the The sample data is processed and analyzed.

In another possible design mode, the above method further includes: based on the data of the sample to be predicted and the survival risk rate of the sample to be predicted, explaining the preset model to obtain the pairs of different characteristic data in the data of the sample to be predicted impact on survival risk.

In another possible design mode, when the data of the sample to be predicted is the case data of the patient, then the preset model is explained based on the data of the sample to be predicted and the survival risk rate of the sample to be predicted, so as to obtain Predict the impact of different characteristic data in the sample data on the survival risk rate, including: based on the patient's case data and the patient's survival risk rate, explain the preset model to obtain the impact of different characteristic data in the patient's case data on the patient's survival impact on risk.

In another possible design mode, when the above-mentioned sample to be predicted is the data of equipment, the above-mentioned data based on the data of the sample to be predicted and the survival risk rate of the sample to be predicted explain the preset model to obtain the The impact of different feature data in the data on the survival risk rate, including: based on the data of the device and the survival risk rate of the device, the preset model is explained to obtain the impact of different feature data in the device data on the survival risk rate of the device.

Through these several possible design methods, based on the impact of different feature data on the survival risk rate in the data of the sample to be predicted obtained by the method provided by this application, domain experts can guide based on the impact of different features on the sample survival risk rate practice. For example, for a patient, clinicians can adjust the patient's clinical treatment plan based on the impact of different treatment data in the patient's case data on the patient's survival risk rate. For another example, for equipment, engineers can improve and optimize the equipment based on the impact of different characteristic data of the equipment on the survival risk rate of the equipment.

In another possible design manner, the above method further includes: using training sample data to train an initial model to obtain a preset model. Among them, the initial model includes an initial gating network and multiple initial expert networks.

In another possible design manner, the training of the initial model by using the data of the training samples includes: inputting the data of the training samples into the initial gating network and multiple initial expert networks in the initial model. The weight coefficient of each initial expert network is obtained according to the initial gating network, and the output values of multiple initial expert networks are weighted and summed according to the corresponding weight coefficient of each initial expert network to obtain the predicted survival risk rate of the training sample. A loss function is determined based on the predicted survival hazard rates of the training samples and the survival data of the training samples. The network parameters of an initial gating network and multiple initial expert networks are tuned based on a loss function.

Wherein, the survival data of the training sample includes the time when the training sample is observed, and the survival status of the training sample at this time. Here, the time for observing the training sample may be the survival time of the training sample, or any time after the initial event of the training sample occurs and before the ending event occurs. Here, the start event and the end event of the training sample are related to the application scenario of the preset model trained by the training sample. For example, when the survival risk rate predicted by the preset model is used to study the efficacy of anticancer drugs, the initial event of the training sample can be that the patient starts to take the anticancer drug, and the final event can be the death of the patient. Alternatively, when the survival risk rate predicted by the preset model is used to study the survival rate of patients after surgery, the initial event of the training sample can be the operation of the patient, and the outcome event can be the death of the patient. Alternatively, when the survival risk rate predicted by the preset model is used to study the life of the device, the initial event of the training sample can be the delivery of the device/part, and the final event can be the failure of the device, and so on. The survival state of the training sample includes two states of survival and death of the training sample. In this way, through the two possible designs, the preset model used in predicting the survival risk rate provided by the present application can be obtained through end-to-end training.

In a second aspect, the present application provides a device for predicting survival risk.

In a possible design manner, the device for predicting the survival risk rate is used to implement any one of the methods provided in the first aspect above. The present application may divide the device for predicting survival risk into functional modules according to any one of the methods provided in the first aspect above. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. Exemplarily, the present application may divide the apparatus for predicting the survival risk rate into an acquisition unit, a processing unit, and the like according to functions. For the description of the possible technical solutions and beneficial effects performed by the above-mentioned divided functional modules, reference may be made to the technical solutions provided by the first aspect or its corresponding possible designs, and details will not be repeated here.

In another possible design, the device for predicting the survival risk rate includes: one or more processors and a transmission interface, the one or more processors receive or send data through the transmission interface, and the one or more processing The device is configured to invoke program instructions stored in the memory, so that the apparatus for predicting survival risk rate executes any method as provided in the first aspect and any possible design manner thereof.

In a third aspect, the present application provides a computer-readable storage medium, the computer-readable storage medium includes program instructions, and when the program instructions are run on a computer or a processor, the computer or the processor executes any of the steps in the first aspect. Either method provided by a possible implementation.

In a fourth aspect, the present application provides a computer program product, which, when running on a device for predicting survival risk, causes any one of the methods provided in any one of the possible implementations in the first aspect to be executed.

It can be understood that any of the devices, computer storage media, or computer program products provided above for predicting the survival risk rate can be applied to the corresponding methods provided above. Therefore, the beneficial effects that it can achieve can refer to The beneficial effects of the corresponding method will not be repeated here.

In this application, the name of the above-mentioned device for predicting the survival risk rate does not constitute a limitation on the device or functional module itself, and in actual implementation, these devices or functional modules may appear with other names. As long as the functions of each device or functional module are similar to those of the present application, they fall within the scope of the claims of the present application and their equivalent technologies.

Description of drawings

Fig. 1 is a schematic diagram of a survival curve;

FIG. 2 is a schematic structural diagram of a prediction device provided in an embodiment of the present application;

FIG. 3 is a schematic flowchart of a training method for a preset model provided in an embodiment of the present application;

Fig. 4 is a schematic structural diagram of an initial model provided in the embodiment of the present application;

FIG. 5 is a schematic structural diagram of an expert network provided by an embodiment of the present application;

FIG. 6 is a schematic flowchart of a method for predicting survival risk provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a method for explaining a preset model provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of the survival curves of samples in each group after a preset model provided in the embodiment of the present application groups the samples in the sample set;

Fig. 9 is a histogram of results indicating the consistency of the model after internal verification and external verification of the model obtained from the sample training of hospital A based on the method provided by the embodiment of the present application and the existing method;

FIG. 10 is a schematic structural diagram of a device for predicting survival risk provided by an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a signal carrying medium for carrying a computer program product provided by an embodiment of the present application.

detailed description

In order to understand the embodiments of the present application more clearly, some terms or technologies involved in the embodiments of the present application are described below:

1), survival curve

The survival curve refers to the curve of the survival rate (or survival rate) of the observed sample over time. Wherein, as opposed to death, survival may refer to living things. Survival, as opposed to relapse or progression of disease, can refer to a patient's disease being in remission. As opposed to failure (or failure) of a device/system/part, survival may be the normal functioning of the device/system/part. Compared with the loss of customers, survival can refer to the normal maintenance of customers.

In practical applications, the survival curve can be used to reflect the recurrence of the disease after being cured, or the survival curve can be used to reflect the failure of the equipment/parts from the factory.

Taking the number of observation samples as 1000 and the observation time as an example in days, refer to FIG. 1 , which shows a schematic diagram of a survival curve. As shown in Figure 1, the horizontal axis can represent the observation time, and the vertical axis can represent the survival rate of the observed samples. Then the curve of the survival rate of 1000 samples changing with time may be the survival curve 10 shown in FIG. 1 . It can be seen that on the first day, the survival rate of 1000 samples is 90%. On the second day, the survival rate of the sample drops by 45%, that is, based on the samples that survived on the first day, the survival rate of the sample on the second day is 50%. On the third day, the survival rate of the sample drops by 20%, that is, based on the samples that survived the second day, the survival rate of the sample on the third day is 45%, and so on.

In addition, for a sample, the survival curve of the sample is the curve of the survival probability of the sample changing with time.

For example, on the first day after surgery, the probability of survival for a patient is 0.3. On the second postoperative day, the probability of survival was 0.5. On the third postoperative day, the probability of survival is 0.8, and so on.

2), survival time

Survival time refers to the time elapsed from the starting event of the observed target to the occurrence of the ending event. Wherein, the ending event of the observation target is the target event mentioned above.

For example, if the observation target is the patient's postoperative survival, the starting event of the observation target may be the operation on the patient, and the outcome event of the observation target may be the death of the patient. In this case, the period from the operation to the patient's death can be called the postoperative survival time of the patient.

For another example, if the observation target is the service life of equipment/parts, the starting event of the observation target may be the completion of the production of the equipment/parts, and the ending event of the observation target may be the failure of the equipment/parts. In this case, the period from the completion of the production of the equipment/part to the failure of the equipment/part can be called the survival time of the equipment/part.

3) Survival hazard rate

The survival hazard rate is the probability of death of a sample within a unit of time. That is, the survival hazard ratio of the sample is used to express the survival risk of the sample.

In the risk function represented by formula 1 above, exp(b) is the survival risk rate. It should be understood that the higher the risk rate of sample survival, that is, the higher the mortality rate of the sample, that is, the lower the survival rate of the target.

4) Truncated data

Truncated data can also be called time-to-event data, which is data used to indicate whether an event occurs at a certain time.

For example, if a postoperative patient relapses one year after the operation, the patient's relapse and the time of relapse can be called truncated data.

It can be seen that the truncated data includes data in two dimensions, one is the time dimension and the other is the event dimension. In the temporal dimension, truncated data consist of consecutive observation times. In the event dimension, truncated data includes discrete event states. Wherein, the event state includes two states, one is a state where an event occurs (ie, event=1), and the other is a state where an event does not occur (ie, event=0).

5), survival analysis

Survival analysis refers to a family of statistical methods used to explore the timing of an event of interest. For example, explore the probability of occurrence of a target event at a certain time.

When performing survival analysis on the target event, it is usually possible to establish an analysis model based on the characteristic data affecting the occurrence of the target event in multiple known samples of the experiment (or survey) and the survival data of the multiple known samples, and through the analysis model The risk function h(t) of the occurrence of the target event of the sample to be predicted is predicted, and the risk function h(t) can be used to determine the risk of the target event occurring at different times. Wherein, the survival data is generally truncated data, for example, data including time and whether a target event occurs at this time point. It should be understood that the time mentioned here may be a survival time or any observation time, which is not limited.

The method of survival analysis can be applied but not limited to the following real scenarios:

A. Medical and health: Through the survival analysis of the disease course, the prognosis analysis of the disease is realized. Among them, prognosis is the prediction of the development process and consequences of a certain disease. According to whether treatment is received during the occurrence or development of the disease, the prognosis can be divided into natural prognosis and treatment prognosis.

B. Urban construction: Through the survival analysis of urban rail equipment, the risk rate of future failure of urban rail equipment can be predicted. Or, through survival analysis of urban water supply network pipelines, the risk rate of urban water supply network pipeline bursts can be predicted. etc.

C. In terms of financial services: through the survival analysis of consumers' consumption installments, the risk rate of default in consumption installments can be predicted.

6), non-Euclidean data (non-euclidean space data)

Non-Euclidean data can also be called non-Euclidean data. Non-Euclidean data is data that is not neatly arranged and arranged irregularly. In a sample composed of non-Euclidean data, the order or position of the data does not affect the characteristics of the sample.

In practice, non-European data exist in many fields. For example, social network data in the field of social science, sensor networks in the field of communication technology, regulatory networks in the field of genomics, or mesh surfaces in computer graphics, etc.

It can be understood that the amount of non-European data in actual scenarios is very large and the structure is complex.

7) Residual network (ResNet) and residual fully-connected neural network (RFCN)

ResNet is a kind of neural network. ResNet includes skip connections or shortcut connections. These connections can make data transfer between network layers skip some network layers, thereby avoiding network degradation and gradient disappearance in deep neural networks. And it can improve the training speed of the network, and at the same time, it can make the number of layers of the network very deep.

It should be understood that a deep neural network with a deep number of layers is more conducive to processing data with complex structures.

RFCN is a neural network based on a fully connected layer and introducing skip connections or shortcut connections.

8) Other terms

In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.

In the embodiments of the present application, the terms "first" and "second" are used for description purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present application, unless otherwise specified, "plurality" means two or more.

It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is an association relationship describing associated objects, which means that there may be three kinds of relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists independently. situation. In addition, the character "/" in this application generally indicates that the contextual objects are an "or" relationship.

It should be understood that determining B according to A does not mean determining B only according to A, and B may also be determined according to A and/or other information.

It should be understood that "one embodiment", "an embodiment" and "a possible implementation" mentioned throughout the specification mean that specific features, structures or characteristics related to the embodiment or implementation are included in this application. In at least one embodiment of . Therefore, appearances of "in one embodiment" or "in an embodiment" or "one possible implementation" throughout the specification do not necessarily refer to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.

When the relationship between the characteristic variables affecting the target event and the occurrence of the target event is a nonlinear relationship, when realizing the survival analysis of the target event, in a possible implementation, a multi-layer perceptron (multi-layer perception, MLP) to establish a deep survival analysis (deepsurv) model. However, the feedforward fully connected neural network of MLP ignores the hierarchical relationship between the networks, and the too deep feedforward fully connected neural network is prone to the problem of gradient disappearance. In this way, the prediction accuracy of the deep survival analysis model will be low.

In another possible implementation, multiple different weak models (such as multiple coxPH models) can be pre-trained based on the same sample set, and then integrated and fused to obtain a comparison For weak models, an integrated model with better accuracy and generalization ability (such as the coxPH integrated model). However, the process of integrating and merging models is usually complicated, and this way of obtaining an integrated model is not an end-to-end way of obtaining a model. In addition, since the integrated model includes multiple weak models, the differences among the multiple weak models will affect the interpretation of the integrated model to a certain extent.

Based on this, the embodiment of the present application provides a method for predicting the survival risk rate, which can predict the survival risk rate of the sample to be predicted based on the pre-trained model obtained in advance, and the survival risk rate is used to represent the sample to be predicted Based on the risk rate and the baseline risk function, the risk function reflecting the survival curve of the sample to be predicted can be determined, thereby realizing the survival analysis of the sample to be predicted. Wherein, the survival risk rate of the sample to be predicted is used to represent the survival risk of the sample to be predicted.

The above preset model includes a gated network and multiple expert networks. Among them, the gating network is used to obtain the weight coefficient corresponding to each expert network according to the samples to be predicted. The survival risk rate of the sample to be predicted is the weighted summation of the output values of the above-mentioned multiple expert networks according to the weight coefficient corresponding to each expert network in the preset model.

Wherein, the preset model provided in the embodiment of the present application can be trained based on an end-to-end method, and the preset model can be regarded as an integrated model after integration and fusion of multiple expert networks according to the weight coefficients generated by the gating network. Therefore, the accuracy of the survival risk rate of the sample to be predicted based on the prediction of the preset model is relatively high, thereby improving the accuracy of the survival analysis of the sample to be predicted based on the risk rate. Wherein, the specific training method of the preset model can refer to the description below, and will not be repeated here.

In addition, the expert network in the above preset model can be implemented by RFCN, so that the preset model can be trained based on non-Euclidean data with nonlinear characteristics. Since the amount of non-European data in the real scene is very large and the structure is complex, the preset model trained based on non-European data has stronger learning ability and higher prediction accuracy.

The embodiment of the present application also provides a device for predicting survival risk (hereinafter referred to as the predicting device). The predicting device may be any computing device with computing capability or a computing device set composed of multiple computing devices. For example, the predicting device may be a computing device such as a notebook computer or a desktop computer, and the predicting device may also be a server or a collection of servers.

It should be noted that the above-mentioned preset model may be preset in the predicting device. As an example, the preset model may be stored in the prediction device in the form of an application program. In some other embodiments, the predicting device may not preset the above-mentioned predicting model, for example: the predicting device may call the preset model deployed on the cloud through an application programming interface (application programming interface, API) call.

Referring to FIG. 2 , FIG. 2 shows a schematic structural diagram of a prediction device provided by an embodiment of the present application. As shown in FIG. 2 , the prediction device 20 includes a processor 21 , a main memory (main memory) 22 , a storage medium 23 , a communication interface 24 and a bus 25 . The processor 21 , the main memory 22 , the storage medium 23 and the communication interface 24 may be connected through a bus 25 .

The processor 21 is the control center of the prediction device 20, which can be a general central processing unit (central processing unit, CPU), and the processor 21 can also be other general processors, digital signal processing (digital signal processing, DSP), dedicated integrated Application-specific integrated circuit (ASIC), field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, graphics processing unit , GPU), neural network processing unit (neural processing unit, NPU), tensor processing unit (tensor processing unit, TPU) or artificial intelligence (artificial intelligent) chips, etc.

As an example, the processor 21 may include one or more CPUs, such as CPU 0 and CPU 1 shown in FIG. 2 . In addition, the present application does not limit the number of processor cores in each processor.

The main memory 22 is used to store program instructions, and the processor 21 can execute the program instructions in the main memory 22 to implement the method for predicting the survival risk rate provided by the embodiment of the present application.

In a possible implementation manner, the main memory 22 may exist independently of the processor 21 . The main memory 22 can be connected with the processor 21 through the bus 25, and is used for storing data, instructions or program codes. When the processor 21 invokes and executes the instructions or program codes stored in the main memory 22, the method for predicting the survival risk rate provided by the embodiment of the present application can be realized.

In another possible implementation manner, the main memory 22 may also be integrated with the processor 21 .

Storage medium 23 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. Among them, the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available such as static random access memory (static RAM, SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) and direct Memory bus random access memory (direct rambus RAM, DR RAM). As an example, the storage medium 23 may be used for the training sample data in the embodiment of the present application.

The communication interface 24 is used to connect the prediction device 20 with other devices (such as terminals, etc.) through a communication network, and the communication network can be Ethernet, radio access network (radio access network, RAN), wireless local area network (wireless local area networks) , WLAN) etc. The communication interface 24 may include a receiving unit for receiving data, and a sending unit for sending data.

The bus 25 may be an Industry Standard Architecture (Industry Standard Architecture, ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 2 , but it does not mean that there is only one bus or one type of bus.

It should be pointed out that the structure shown in FIG. 2 does not constitute a limitation to the prediction device 20. In addition to the components shown in FIG. 2, the prediction device 20 may include more or fewer components than those shown in FIG. 2, or Combining certain parts, or different arrangements of parts.

It should be noted that when the above-mentioned prediction device is a server, the embodiment of the present application also provides a system for predicting the survival risk rate (hereinafter referred to as the prediction system). The prediction system may include a terminal and a server. Connect and communicate via wired or wireless. Wherein, the preset model mentioned above is preset in the server.

Wherein, the terminal can be used to receive the data of the sample to be predicted input by the user, and the server can be used to receive the data of the sample to be predicted from the terminal, and return the prediction result to the terminal after processing the received data of the sample to be predicted.

Optionally, the terminal may be a terminal device such as a mobile phone, a notebook computer, or a desktop computer, which is not limited in this embodiment of the present application.

The embodiment of the present application also provides a training device for a preset model (hereinafter referred to as the training device), and the training device may be any computing device with computing capability. For the hardware description of the training device, reference may be made to the hardware description of the prediction device above, and details are not repeated here.

It should be understood that the training device may be the same device as the prediction device described above, or may be a different device, which is not limited in this embodiment of the present application.

The method provided by the embodiment of the present application will be described in detail below with reference to the accompanying drawings.

In the following, the training method of the preset model provided by the embodiment of the present application will be firstly described.

Referring to FIG. 3 , FIG. 3 shows a schematic flowchart of a training method for a preset model provided by an embodiment of the present application. The method may be performed by the training device described above. The method can include:

S101. Obtain a training sample set.

Here, the training sample set includes data of multiple training samples and survival data of each training sample.

For example, if the training sample is a patient, the data of the training sample can be the case data of the patient. For another example, if the training sample is a device, the data of the training sample can be any data related to the device, such as attribute data of the device, production data of the device, and so on.

Wherein, the data of each training sample may include data of multiple features of the training sample, and the data of the features included in the data of each training sample may include non-Euclidean data. It can be understood that the number of features included in each training sample may be the same or different.

Optionally, for any feature of the training samples, the number of training samples including the feature in the training sample set is greater than a first threshold. The embodiment of the present application does not specifically limit the value of the first threshold. In this way, it can be ensured that a sufficient number of training samples include any one feature, so that the contribution of any one feature to the preset model determined when interpreting the trained preset model is more accurate.

Table 1 shows an example of a training sample set. As shown in Table 1, the training sample set includes data of n training samples, and the n training samples are respectively training sample 1, training sample 2, training sample 3, . . . , and training sample n. Each training sample includes data of m features, and the m features are feature 1, feature 2, feature 3, . . . , and feature m. Wherein, both n and m are positive integers.

Table 1

the	特征1 feature 1	特征2 feature 2	特征3 feature 3	……	特征mfeature m

训练样本1training sample 1	0.740.74	0.310.31	0.200.20	……	0.580.58
训练样本2 training sample 2	0.080.08	0.340.34	0.200.20	……	0.120.12
训练样本3 training sample 3	0.490.49	0.740.74	0.180.18	……	0.780.78
……	……	……	……	……	……
训练样本ntraining sample n	0.780.78	0.840.84	0.560.56	….....	0.620.62

It should be understood that the features included in the training samples are related to the application scenarios of the preset model trained by the training samples.

Exemplarily, if the preset model is applied to the survival analysis in the hospital setting, the feature data of a training sample may include: the patient's basic data (including the patient's age and body mass index (Body Mass Index, BMI) etc.), The patient's blood test data (including the patient's blood routine data, blood cell data, liver function data and kidney function data, etc.), the patient's vital sign data (including the patient's body temperature, pulse, heart rate, blood pressure, respiration and blood oxygen, etc.), Or a variety of data in the patient's treatment records (including drug name, drug type and drug dosage during drug treatment, as well as plasma therapy, oxygen therapy, etc.).

As another example, if the preset model is applied to the survival analysis of part quality in an industrial scene, the feature data of a training sample may include: the material data of the part, the process data of the production part, and the time of the part's delivery. kinds of data.

It can be seen that there can be hundreds of types of features in a training sample, and the data of these features include non-European data.

In addition, the survival data of each training sample is used as the real value (or label value) when calculating the loss function during the training process of the preset model. Wherein, each training sample includes unique survival data, and the survival data is truncated data (ie, time event data).

Exemplarily, Table 2 shows a set of survival data. Wherein, survival data 1 may be the survival data of training sample 1 in Table 1, survival data 2 may be the survival data of training sample 2 in Table 1, and survival data 3 may be the survival data of training sample 3 in Table 1, ..., the survival data n can be the survival data of the training sample n in Table 1.

Taking survival data 1 as an example, survival data 1 includes training sample 1 on the 10th day, and the ending event of training sample 1 occurs, that is, the event status value is "True/1". Taking the survival data 2 as an example, when the survival data 2 includes the training sample 1 on the 14th day, the ending event of the training sample 2 does not occur, that is, the event status value is "False/0". No longer.

Table 2

the	时间(time)/天time (time)/day	事件(event) event

生存数据1Survival Data 1	1010	True/(1)True/(1)
生存数据2 Survival Data 2	1414	False/(0)False/(0)
生存数据3 Survival Data 3	99	False/(0)False/(0)
……	……	……
生存数据n Survival data n	2020	True/(1)True/(1)

Optionally, the training device can obtain the training sample set from an external storage device. Wherein, the training sample set is pre-stored in the external storage device.

Optionally, the training device may also receive the training sample set from other devices through a communication interface (such as the communication interface 24 shown in FIG. 2 ). Wherein, the training sample set is pre-stored in the other device.

It should be noted that the training sample set acquired by the training device may be a preprocessed training sample set or a non-preprocessed training sample set.

When the training sample set obtained by the training device is a training sample set that has not been preprocessed, the training device can preprocess the training sample set after obtaining the training sample set. In the embodiment of the present application, the preprocessing The specific content is not limited.

As an example, the preprocessing of the training sample set may be to delete abnormal training samples in the training sample set (for example, the number of features in the training sample is less than the first threshold), and may be to delete abnormal feature data in the training sample (for example, a certain In the case sample, the height of the patient is 10m), which can be to delete the features whose missing value is greater than the second threshold in all training samples in the training sample set (the feature whose missing value is greater than the threshold refers to the number of features in the training sample set greater than the second threshold features that are not included in the training samples), or data that normalizes the feature data, etc., which are not specifically limited in this embodiment of the present application.

S102. Train the initial model by using the data of the training samples in the training sample set to obtain a preset model.

Specifically, the training device may iteratively train the initial model based on the acquired data of the training samples in the training sample set, so as to obtain the preset model. Here, the initial model may be a model pre-designed by the designer, and the initial model is designed as a model for predicting the survival risk rate of the sample.

Wherein, the initial model may include a gating network and multiple expert networks. The gating network can be, for example, a neural network classifier, and the expert network can be, for example, RFCN.

Wherein, when the model is trained each time, the gating network is used to obtain the weight coefficient corresponding to each expert network according to the received training samples. Multiple expert networks are respectively used to learn the training samples to output their own learning results. In this way, the learning results of multiple expert networks are weighted and summed according to the weight coefficients of the multiple expert networks obtained by the gating network, and the output value (or output result) of the current model can be obtained, and the output value is The predicted value obtained after the current model learns the training sample, the predicted value is the survival risk rate of the training sample predicted by the current model.

Specifically, the gating network can learn the type of the received training samples, and then assign corresponding weights to each expert network according to the learned type. Here, the gating network is a network that learns and classifies the training samples autonomously, and the number of types after the gating network classifies the training samples is equal to the number of expert networks included in the initial model.

It can be seen that when the gating network assigns corresponding weights to each expert network, one type of training sample corresponds to one expert network.

Further, the weights of the plurality of expert networks determined by the gating network may be normalized by a preset function to obtain weight coefficients corresponding to the plurality of expert networks. Wherein, the sum of the normalized weight coefficients is 1.

Exemplarily, the weights of the plurality of expert networks determined by the gating network can be exponentially normalized through the softmax function. Here, the process of exponentially normalizing multiple data through the softmax function will not be described in detail.

In this way, the results output by each expert network are weighted, multiplied and summed (ie weighted sum) according to the weight coefficient corresponding to each expert network, and the predicted value obtained by the current model after learning the received training samples can be obtained. This process can be represented by the following formula (2). Wherein, each expert network is used to learn the training samples and predict the survival risk rate of the training samples.

Formula (2)

Among them, x represents the training samples, N is the number of expert networks, and i represents the i-th expert network among the N expert networks. F(x) represents the survival risk rate (that is, the predicted value) output by the model after learning the training sample x. G(x) represents the weight of the N expert networks output by the gating network, and τ represents the temperature coefficient, which is used to indicate the smoothness of the normalized result when Softmax performs exponential normalization on multiple weights, which is usually preset. Softmax(G(x), τ) _i represents the weight coefficient corresponding to the i-th expert network obtained after exponentially normalizing the weight G(x) output by the gating network, f _i (x) represents the i-th expert network Learn the processed results for the training sample x.

As an example, taking the training sample set used to train the initial model including two types of training samples (for example, including male training samples and female training samples) as an example, refer to FIG. 4, which shows that the embodiment of the present application provides A schematic diagram of the structure of an initial model. As shown in FIG. 4 , the initial model 40 includes a gating network 41 and two expert networks, and the two expert networks are respectively an expert network 421 and an expert network 422 .

Wherein, after the initial model 40 receives the input training sample 1, the expert network 421 performs learning processing on the training sample 1 to obtain the result 1, and the expert network 422 performs learning processing on the training sample 1 to obtain the result 2.

After learning and processing the training sample 1 , the gating network 41 can output weight 1 for the expert network 421 and output weight 2 for the expert network 422 based on the learned type of the training sample 1 . Then, the softmax function exponentially normalizes the two weights output by the gating network to obtain the weight coefficient 1 of the expert network 421 and the weight coefficient 2 of the expert network 422 .

In this way, the initial model can be obtained by adding the result obtained by multiplying the weight coefficient 1 and the result 1 output by the expert network 421 and the result obtained by multiplying the weight coefficient 2 and the result 2 output by the expert network 422 The predicted value output after learning the training sample 1, the predicted value is the survival risk rate of the training sample 1 predicted by the initial model after learning the training sample 1.

It should be noted that any expert network among the plurality of expert networks in the above initial model may include at least one candidate RFCN.

When any one of the expert networks includes one candidate RFCN, the output result of the candidate RFCN after learning and processing the training samples is the output result of any expert network after learning and processing the training samples.

When multiple candidate RFCNs are included in any one of the expert networks, an evaluation module is also included in the any one of the expert networks, and the evaluation module is used to evaluate the results obtained after each candidate RFCN learns the training samples, and will satisfy The result of the pre-set conditions is used as the output of any expert network. Wherein, the output result of "satisfying the preset condition" may be the output result closest to the sample label value among the output results of multiple candidate RFCNs, which is the output result satisfying the preset condition. In this way, the accuracy of model prediction can be improved.

As an example, for any expert network, the evaluation module can obtain the result after learning the training sample a based on each candidate RFCN in the expert network (that is, the predicted value of the training sample a output by each candidate RFCN), and the training The survival data of sample a, calculate the loss function of each candidate RFCN. Then, the output result of the candidate RFCN corresponding to the loss function with the smallest value (the smallest loss function means that the predicted value is closest to the real value) is used as the output value of any expert network.

Wherein, the embodiment of the present application does not specifically limit the specific implementation manner of evaluating the result with the best performance from the learning results of multiple candidate RFCNs.

It should be understood that the network structures of multiple candidate RFCNs in the same expert network are different. The different network structures of the candidate RFCNs, for example, may be different network structures/layers skipped by the skip connections or shortcut connections of the candidate RFCNs, which is not limited in this embodiment of the present application.

It should also be understood that for the multiple expert networks included in the initial model, the candidate RFCN sets included in each expert network are different from each other. Optionally, there may be an intersection among candidate RFCN sets included in each of the plurality of expert networks.

Taking the initial model including 3 expert networks as an example, for example, expert network 1 may include candidate RFCN 1, candidate RFCN 2, and candidate RFCN 3. Expert network 2 may include RFCN 1 and candidate RFCN 2. The expert network 3 may include RFCN 3 and candidate RFCN 4.

As an example, refer to FIG. 5 , which shows a schematic structural diagram of an expert network provided by an embodiment of the present application. As shown in FIG. 5 , the expert network 421 includes three candidate RFCNs, which are candidate RFCN 511 , candidate RFCN 512 and candidate RFCN 513 . The expert network 421 also includes an evaluation module 52 .

As shown in Figure 5, when the expert network 421 receives the training sample 1, the candidate RFCN 511 can learn and process the training sample 1 to obtain the result 1. Similarly, the candidate RFCN 512 can learn and process the training sample 1 to obtain the result 2, and the candidate RFCN 513 can learn and process the training sample 1 to obtain the result 3.

Then, the evaluation module 52 may evaluate the result 1, the result 2 and the result 3, and determine the result with the best performance. For example, the evaluation module 52 determines that the result with the best performance is the result 2, and the expert network 421 outputs the result 2.

In this way, the training device iteratively trains the initial model with the structure described above based on the acquired training samples to obtain the preset model. Specifically, the training device performs iterative training on the initial model with the structure described above based on the obtained training samples, and the process of obtaining the preset model can be described as follows:

The training device inputs the training sample 1 in the training sample set into the model to be trained. Here, when the training device inputs a training sample to the model to be trained for the first time, the model to be trained is the initial model described above.

In this way, after the model to be trained receives the training sample 1 input by the training device, each expert network in the model to be trained can perform learning processing on the training sample 1 and output respective learning results. The learning result output by each expert network is the predicted value output by each expert network after learning the training sample 1.

The gating network in the model to be trained learns and classifies the training sample 1, and outputs the weight corresponding to each expert network based on the learned type. Next, the training device performs normalization processing on the weights corresponding to each expert network, so as to determine the weight coefficients corresponding to each expert network.

It can be understood that when the model to be predicted is the initial model, that is, the first training of the initial model by the training device, after the gating network learns the training sample 1, it can randomly output the weight corresponding to each expert network according to the learning result . Among them, the expert network with the largest weight can be regarded as the expert network corresponding to the type of the current training sample 1 .

Next, the training device weights, multiplies and sums the predicted values output by multiple expert networks according to the weight coefficient of each expert network, so as to obtain the predicted value of the training sample 1 output by the model to be trained. It should be understood that the predicted value of the training sample 1 is the survival risk rate of the training sample 1 predicted by the model to be predicted.

Then, the training device can calculate the loss function based on the predicted value output by the model to be trained and the survival data of the training sample 1 (that is, the real value, or the label value of the training sample). Since the survival data are truncated data. Therefore, optionally, in this embodiment of the present application, a loss function for truncated data may be calculated based on a negative log-likelihood (NLL) score.

It should be understood that the training device may calculate the loss function of the model to be predicted based on the predicted value output by the model to be predicted and the survival data of the training sample 1 . Among them, the loss function of the model to be predicted is passed backwards, and the network parameters of each expert network are adjusted according to the weight coefficient of each expert network. It can be understood that the adjustment amount of the network parameters of the expert network is directly proportional to the weight coefficient of the expert network. For example, the network parameter adjustment amount of the expert network with a large weight coefficient is relatively large, and the network parameter adjustment amount of the expert network with a small weight coefficient is small.

It should also be understood that the training device may also calculate the loss function corresponding to each expert network based on the output value of each expert network and the survival data of the training sample 1 . Based on the expert network corresponding to the smallest loss function among the loss functions of multiple expert networks, and the weight coefficient assigned by the gating network to the expert network, the parameters of the gating network are adjusted, so that the next time the gating network receives and trains After sample 1 has a training sample with the same or similar characteristics, assign a larger weight to the expert network corresponding to the aforementioned minimum loss function, so that the expert network can be used exclusively for training samples with the same or similar characteristics as training sample 1 in the subsequent training process. Training samples with similar characteristics are used for learning. In this way, through multiple learning, an expert network can only learn a class of samples with the same or similar characteristics. It should be understood that since the output value of the expert network with heavy weight accounts for a large proportion of the output value of the model to be predicted, when adjusting the network parameters of the expert network with heavy weight based on the loss function of the model to be predicted, the adjustment amount is relatively large, so An expert network with a large weight can learn more features of the training samples.

In this way, after the loss function is calculated based on the predicted value obtained from the processing of the training sample 1 by the sample to be predicted and the network parameters of the model to be predicted are adjusted, the training sample 1 completes a training of the model to be trained.

Then, the training device can input the training sample 2 into the new model to be trained, and refer to the training process of the model to be trained in the training sample 1 to complete a training of the training sample 2 for the new model to be trained.

It should be noted that when the gating network assigns weights to each expert network after learning the training sample 2, it can refer to the classification when learning the training sample 1, and assign a larger weight to the expert network corresponding to the type of training sample 2 .

Similarly, the training device may execute the above process multiple times based on the training samples in the training sample set to implement iterative training of the initial model. When the training converges, the preset model provided by the embodiment of the present application is obtained. Among them, the gating network in the preset model is used to classify samples. And the expert network in the preset model is used to predict the survival risk rate of different types of samples. It can be understood that the frame structure of the preset model is the same as the frame structure of the above-mentioned initial model.

The preset model trained by the method described in S101-S102 above can process the samples to be predicted to predict the survival risk rate of the samples to be predicted, and then determine the survival risk rate of the samples to be predicted according to the survival risk rate of the samples to be predicted. The survival curve of the sample realizes the survival analysis of the sample to be predicted.

Referring to FIG. 6 , FIG. 6 shows a schematic flowchart of a method for predicting survival risk provided by an embodiment of the present application. The method can be executed by the prediction device shown in FIG. 2 , and the prediction model trained by the method described in S101-S102 is preset in the prediction device. The method can include:

S201. Obtain data of samples to be predicted.

Wherein, the detailed description of the prediction device acquiring the data of the sample to be predicted can refer to the description of the training device acquiring the training sample in S101 above, which will not be repeated here.

S202. Process the data of the sample to be predicted by using a preset model to obtain the survival risk rate of the sample to be predicted.

Specifically, the prediction device may input the acquired data of the sample to be predicted into a preset model, and process the data of the sample to be predicted through the preset model to obtain the survival risk rate of the sample to be predicted.

Wherein, the survival risk rate of the sample to be predicted can be used to determine the survival curve of the sample to be predicted, so that the survival analysis of the sample to be predicted can be performed.

Among them, the preset model processes the data of the above-mentioned samples to be predicted to obtain the survival risk rate of the samples to be predicted. You can refer to the process of processing the training sample 1 by the model to be predicted in S102 above to obtain the predicted value of the training sample 1. The description of the process is not repeated here.

In this way, the risk function of the sample to be predicted can be determined based on the survival risk rate of the sample to be predicted predicted by the prediction model and the above-mentioned formula (1).

It should be understood that the characteristic data of the sample to be predicted can be used as x ₁ , x ₂ , ... x _p in the formula (1) represent covariates, and expb ₁ ·expb ₂ ·...·expb _p is predicted by the preset model The survival hazard rate of the sample to be predicted.

In this way, after the risk function of the sample to be predicted is determined, the risk function can reflect the survival curve of the sample to be predicted. For example, if the risk value of the sample to be predicted is high at a certain time, it means that the survival rate of the sample to be predicted at this time is low.

In this way, in the method for predicting the survival risk rate provided by the embodiment of the present application, since the preset model used to predict the sample to be predicted is obtained by training based on non-European data, and the preset model is equivalent to a plurality of expert networks Integration and fusion, therefore, the accuracy of the survival risk rate of the sample to be predicted predicted by the method for predicting the survival risk rate provided by the embodiment of the present application is relatively high. Therefore, the accuracy of the risk function of the sample to be predicted based on the survival risk rate is improved, and the survival curve of the sample to be predicted can be accurately reflected.

In addition, it can be seen from the above method of training the model that each training sample used to train the preset model includes many features, and the contribution of each feature of the training sample to the predicted value output by the trained preset model is also not exactly. Therefore, in practical applications, if the contribution of each feature in the training sample to the predicted value output by the preset model can be determined, then the degree of influence of each feature in the training sample on the occurrence of the target event can be determined. For example, the degree of impact of different treatments on the survival time of patients. In this way, based on the degree of influence of each feature on the occurrence of target events, the optimization and improvement of samples in real scenes can be guided.

To achieve the above purpose, the embodiment of the present application can determine the contribution of each feature in the sample to the predicted value of the sample by interpreting the preset model. Alternatively, the embodiment of the present application may also analyze the cause of the predicted value of the sample by explaining the sample. Here, the method for interpreting the preset model described in the embodiments of the present application, or the method for interpreting the sample, can be executed by any device that has computing power and is preset with the preset model described above. To simplify the description, the embodiments of the present application are described below by taking the method of interpreting the preset model and samples performed by the prediction device as an example.

Wherein, the prediction device interprets the preset model, which may include explaining the preset model itself, explaining the expert network in the preset model, or explaining the gating network in the preset model. kind.

Taking the prediction device interpreting the preset model itself as an example, the prediction device can obtain a bee swarm diagram (beeswarm) for explaining the preset model according to the preset model and a plurality of training samples. Here, the bee colony diagram is used to show the contribution of each feature in the sample to the predicted value output by the preset model.

Specifically, the predicting device may respectively input multiple training samples into the preset model, so as to obtain respective predicted values corresponding to the multiple training samples. Then, the prediction device may draw a bee colony diagram based on the feature data of the plurality of training samples and the respective predicted values corresponding to the plurality of training samples. Wherein, the predicting device may draw the bee colony diagram based on a shape value (shape value) method. Here, the embodiment of the present application does not describe the specific implementation process of the shape value method in detail.

Referring to FIG. 7 , FIG. 7 shows a schematic diagram of a method for explaining a preset model provided by an embodiment of the present application. As shown in FIG. 7 , on the interface 70 on the display screen of the preset device, a frame diagram of the preset model may be displayed. It should be understood that the interface 70 can be a model interpretation interface in the user interface of the preset model, and the frame diagram on the interface 70 includes the interface buttons of the gated network 71 and two expert networks in the preset model (the expert network 711 and the expert network network 712).

As shown in (a) in Figure 7, when the user clicks the "input" button on the interface 70 with the mouse, the sample to be output can be selected on the input sample interface, and the local storage can be input to the preset model after confirmation. The purpose of multiple training samples. Then, after the user clicks the "output" button on the interface 70, the display screen of the prediction device can display a bee colony diagram for explaining the preset model, such as the bee colony diagram shown in (b) in Figure 7 .

As shown in (b) of FIG. 7 , in the bee colony diagram displayed on the interface 71 , the darker the gray, the larger the eigenvalue, and the lighter the gray, the smaller the eigenvalue. In addition, the abscissa of the bee colony diagram is used to represent the contribution of the feature to the predicted value output by the preset model.

It can be seen that for feature 1, when the feature value of feature 1 is large, the contribution of feature 1 to the predicted value output by the preset model is negative, and the larger the feature value of feature 1 (that is, the darker the gray), the feature The smaller the contribution of 1 to the predicted value output by the preset model (the larger the absolute value of the negative value, the smaller the contribution); on the contrary, when the eigenvalue of feature 1 is small, the predicted value of feature 1 to the output of the preset model The contribution of is positive, and the smaller the feature value of feature 1 (that is, the lighter the gray), the greater the contribution of feature 1 to the predicted value output by the preset model (the larger the positive value, the greater the contribution).

Similarly, for feature 9, when the feature value of feature 9 is larger, the contribution of feature 9 to the predicted value output by the preset model is positive, and the larger the feature value of feature 9 (that is, the darker the gray), the feature 9 The greater the contribution to the predicted value output by the preset model (the larger the positive value, the greater the contribution); on the contrary, when the eigenvalue of feature 9 is small, the contribution of feature 9 to the predicted value output by the preset model is Negative, and the smaller the feature value of feature 9 (that is, the lighter the gray), the smaller the contribution of feature 9 to the predicted value output by the preset model (the larger the absolute value of the negative value, the smaller the contribution).

It can be understood that if it is necessary to explain the gated network in the preset model, the user can click the button "gated network 71" on the interface 70 after operating the "input" button to realize sample input, and then Click the "Output" button, so that the contribution of the sample features to the output value of the gating network can be obtained. Similarly, if it is necessary to explain any expert network in the preset model, the user can click the button corresponding to any expert network on the interface 70 after operating the "input" button to realize sample input, Then click the "Output" button, so that the contribution of the sample characteristics to any expert network output value can be obtained.

In this way, by using multiple samples to explain the preset model and determining the degree of influence of each feature of the sample on the output value of the preset model, relevant guidance in the real scene can be realized.

For example, if the data of multiple samples used to explain the preset model is the case sample data of multiple cancer patients, then through model interpretation, when it is determined to use a certain drug treatment (the use of drug treatment is a feature of the sample) to The greater contribution to reducing the survival risk rate of cancer patients indicates that the drug treatment can improve the survival rate of cancer patients. In this way, clinicians can be guided in the medication of cancer patients.

In addition, when any sample needs to be explained, the predicting device can obtain a bee colony diagram for explaining any sample according to the any sample and a preset model. Here, the bee colony diagram is used to show the contribution of each feature in any sample to the predicted value of the sample output by the preset model, so that the cause of the predicted value of any sample can be analyzed.

Specifically, the predicting device may input the sample to be explained into a preset model, so as to obtain the predicted value of the sample to be explained. Then, the predicting device may draw a bee colony diagram based on the characteristic data of the sample to be explained and the predicted value of the sample to be explained.

As an example, take the sample shown in Table 3 as the sample to be explained:

table 3

特征feature	PFSPFS	AgeAge	RMRM	NOXNOX	RADRAD	LSTATLSTAT
待解释样本sample to be explained	15.315.3	65.265.2	6.5756.575	0.5380.538	11	4.984.98

Referring to FIG. 7 , when the user clicks the “input” button on the interface 70 with the mouse, the samples to be explained shown in Table 3 are input into the preset model. Next, the user can click the "Output" button on the interface 70, and the display screen of the prediction device can display a sample explanation diagram for explaining the sample to be explained, such as the sample explanation diagram shown in (c) in FIG. 7 .

As shown in (c) of FIG. 7 , in the sample interpretation diagram displayed on the interface 72 , the predicted value of the sample to be explained is 24.1. Among them, the arrows in the black area point to the direction in which the predicted value increases, and the arrows in the white area point to the direction in which the predicted value decreases. It can be seen that the feature LSTAT contributes the most to improving the predicted value of the sample to be explained (that is, the longest black bar), and the value of the feature RM contributes the most to reducing the predicted value of the sample to be explained (that is, the longest white bar shown).

In this way, through the interpretation of a single sample, the degree of influence of different characteristics in a single sample on the survival risk rate of the single sample can be determined, and then relevant guidance can be given to the sample. For example, if the single sample is a part, after explaining the part through the preset model, when it is determined that the material of the sample is material a, it will make a greater contribution to increasing the survival risk rate of the part, which means that the part manufactured based on material a The survival rate is low, that is, the life of the part is the shortest. This will guide the manufacturer to avoid using material a to make the part.

It can be seen that by interpreting the model or samples, the optimization and improvement of samples in real scenes can be guided at different levels.

In other embodiments, the gating network in the preset model trained in the embodiments of the present application is essentially a classifier. Therefore, the embodiment of the present application can also classify the samples in the sample set based on the gating network in the above preset model. In this way, the samples in the sample set can be divided into multiple groups according to their types, that is, the samples in any group are samples of the same type. For example, a gating network can divide a sample of electronic medical records of patients into a sample of males and a sample of females.

In this way, based on the divided groups of samples, the survival curve corresponding to each group of samples can be drawn. It can be understood that the survival data of each sample in each group of samples here is known. Through this method, the comparative analysis of the survival curves of different types of samples is realized.

As an example, refer to FIG. 8 , which shows a schematic diagram of survival curves of samples in each group after samples in a sample set are grouped by a preset model provided by an embodiment of the present application. Taking the sample as an example of a patient's electronic medical record, and the number of electronic medical record samples is 177, the gating network in the preset model provided by the embodiment of the present application can divide the patient's electronic medical record sample into sample group 1 and sample group 2 Finally, if sample group 1 includes 150 samples and sample group 2 includes 27 samples, based on the survival data of sample group 1 and sample group 2, the survival data of sample group 1 and sample group 2 can be drawn in the same coordinate system survival curve. As shown in the figure, survival curve 1 shown in FIG. 8 is the survival curve of sample group 1, and survival curve 2 is the survival curve of sample group 2. In this way, the difference in survival rate between sample group 1 and sample group 2 at the same time can be seen intuitively from the figure.

In this way, by comparing the survival curves of different types of samples, we can see the difference between the survival curves of different types of samples. In this way, by searching and analyzing the common characteristics of different types of samples by domain experts, the reasons for determining the survival rate in the survival curve can be determined, and then used to guide practice.

Taking the survival curve of sample group 1 and the survival curve of sample group 2 shown in Figure 8 as an example, as shown in Figure 8, the survival rate of survival curve 1 representing the survival curve of sample group 1 is generally lower than that of survival curve 1 representing the survival curve of sample group 2. Survival Curve 2 Survival. In this way, experts in the field (ie, clinicians) can find out the common features in each group of samples through medical professional analysis, so that the common features may be a decisive factor affecting the survival rate of the group of samples. In this way, based on the analysis results, clinicians can be guided to adjust the patient's treatment plan.

In order to further illustrate the consistency of the preset model in the method provided in the embodiment of the present application, the following is described with a specific example:

Example 1. Prediction model for the efficacy of lung cancer drug A

Specifically, taking the pre-collected clinical curative effect data of lung cancer drug A on 385 patients as an example, the embodiment of the present application divides the 385 samples into three sample sets. Among them, sample set 1 includes 177 samples, sample set 2 includes 106 samples, and sample set 3 includes 102 samples. Moreover, the quality of sample set 1 is higher than that of sample set 2, and the quality of sample set 2 is higher than that of sample set 3. Here, the quality of the sample is high, for example, it can be that there are few missing features in the samples in the sample set, the number of features is large, or the number of samples with observed sample outcome events (ie, patient death/recovery) is large.

Next, in the embodiment of the present application, the sample set 1 is used as the training sample set, and the preset model 1 is obtained by training based on the method described in S101-S102 above, and the model 2 is obtained by training based on the existing coxPH method, and obtained by training based on the DeepSurv method. Model 3.

Then, take sample set 2 and sample set 3 as verification sample sets to verify the preset model 1, model 2 and model 3.

As shown in Table 4, Table 4 shows the consistency (concordance index, C-index) index of the preset model 1, model 2 and model 3 after being verified by the same verification sample. It should be understood that the C-index index is used to evaluate the predictive ability of the model. It can be seen that based on the same verification sample, the C-index index of the preset model 1 obtained by the method provided in the embodiment of the present application is higher than the C-index index of the model 2 obtained by the existing coxPH method training, and higher than the existing The C-index index of model 3 obtained by DeepSurv method training.

Table 4

the	样本集2Sample set 2	样本集2+样本集3Sample set 2+ Sample set 3
预设模型1(本申请方法)Preset model 1 (this application method)	0.66650.6665	0.57930.5793
模型2(coxPH方法)Model 2 (coxPH method)	0.54480.5448	0.49380.4938
模型3(DeepSurv方法)Model 3 (DeepSurv method)	0.61480.6148	0.56300.5630

Example 2. A progression prediction model for clinical disease A

In this example, hospital A has recorded clinical data for 2700 patients, ie hospital A includes 2700 samples. In addition, hospital B has recorded clinical data of 1400 patients, that is, hospital B includes 1400 samples.

In this way, in the embodiment of the present application, the sample of hospital A is used as the training sample, and the preset model is obtained by training through the method described in S101-S102 above, and based on the 10× cross-validation method, a part of the sample of hospital A is used to perform the model training. Internal validation, and external validation of the model based on hospital B samples.

Among them, 10×cross-validation refers to: divide the sample set into 10 groups, and use 9 groups of samples as training samples to train the model, and use the remaining group of samples as verification samples to train the aforementioned 9 groups of samples. The model is tested and verified. This process is repeated 10 times to ensure that each group of samples has been used as a verification sample to test and verify the model. In this way, the results of 10 times of verification are averaged to obtain the result of 10×cross-validation.

Referring to FIG. 9 , FIG. 9 shows a bar graph indicating the consistency of the model after internal verification and external verification of the model obtained from hospital A sample training based on the method provided by the embodiment of the present application and the existing method.

As shown in Figure 9, the checkered column is used to indicate the C-index index size of the model obtained after training the samples of hospital A based on the existing DeepSurv method after 10× cross-validation, and the size of the C-index index after training the samples of hospital A The C-index index size of the obtained model after external validation through the sample of hospital B. The striped column is used to indicate the C-index index size of the model obtained after training the samples of hospital A based on the existing coxnet method (the coxnet method is an improved method of the coxPH method) after 10× cross-validation, and the C-index index of the hospital A. The C-index index size of the model obtained after the sample is trained is externally verified by the sample of hospital B. The white column is used to indicate the C-index index size of the model obtained after training the samples of Hospital A based on the method provided in the embodiment of this application after 10× cross-validation, and the model obtained after training the samples of Hospital A passed The size of the C-index index after the external validation of the sample of hospital B.

It can be seen that based on the same training samples and the same verification samples, the C-index index of the preset model trained by the method provided in the embodiment of the present application is higher than the C-index index of the model trained by the existing DeepSurv method and coxnet method. -index index.

To sum up, in the method for predicting the survival risk rate provided by the embodiment of the present application, by using a preset model including a gating network and multiple expert networks to predict the samples to be predicted, the survival risk rate predicted by the method is accurate. The accuracy is higher, which in turn improves the accuracy of the survival curve determined based on the survival hazard ratio.

In addition, since the preset model used in the method of the embodiment of the present application can be obtained through end-to-end training, it is convenient to explain the model at different levels (whole and local), and then the sample can be predicted based on the sample characteristics obtained from the explanation. The contribution of the value is used to guide the sample improvement in the real scene.

The foregoing mainly introduces the solutions provided by the embodiments of the present application from the perspective of methods. In order to realize the above functions, it includes corresponding hardware structures and/or software modules for performing various functions. Those skilled in the art should easily realize that the present application can be implemented in the form of hardware or a combination of hardware and computer software in combination with the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

The embodiment of the present application can divide the functional modules of the device for predicting the survival risk rate according to the above method example, for example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module . The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.

As shown in FIG. 10 , FIG. 10 shows a schematic structural diagram of an apparatus 100 for predicting survival risk provided by an embodiment of the present application. The apparatus 100 for predicting the survival risk rate may be used to implement the above-mentioned method for predicting the survival risk rate, for example, to perform the method shown in FIG. 6 . Wherein, the apparatus 100 for predicting the survival risk rate may include an acquisition unit 101 and a processing unit 102 .

The obtaining unit 101 is configured to obtain data of samples to be predicted. The processing unit 102 is configured to input the data of the sample to be predicted into a preset model, and process the data of the sample to be predicted through the preset model to obtain a survival risk rate representing the survival risk of the sample to be predicted. Among them, the preset model includes a gating network and a plurality of expert networks, the gating network is used to determine the weight coefficient corresponding to each expert network according to the data of the sample to be predicted, and the survival risk rate is based on the weight coefficient corresponding to each expert network The result obtained by weighted summing the output values of multiple expert networks.

As an example, with reference to FIG. 6 , the obtaining unit 101 may be used to execute S201, and the processing unit 102 may be used to execute S202.

Optionally, the apparatus 100 for predicting the survival risk rate further includes: a determination unit 103, configured to determine the risk function of the sample to be predicted based on the survival risk rate of the sample to be predicted and the baseline risk function, wherein the risk function of the sample to be predicted is used Indicates the survival rate of the sample to be predicted at different times.

Optionally, any one of the plurality of expert networks included in the prediction model includes at least one candidate RFCN, and the output value of any one expert network is an output value satisfying a preset condition among the output values of at least one candidate RFCN .

Optionally, the data of the samples to be predicted include non-Euclidean data.

Optionally, the apparatus 100 for predicting the survival risk rate further includes: an interpretation unit 104, configured to explain the preset model based on the data of the sample to be predicted and the survival risk rate of the sample to be predicted, so as to obtain the data of the sample to be predicted Effect of different characteristic data on survival hazard ratio.

Optionally, when the data of the sample to be predicted is the case data of the patient, the interpretation unit 104 is specifically configured to: interpret the preset model based on the case data of the patient and the survival risk rate of the patient, so as to obtain the The impact of different characteristic data on the survival risk rate of patients.

Optionally, when the sample to be predicted is the data of the device, the interpretation unit 104 is specifically configured to: interpret the preset model based on the data of the device and the survival risk rate of the device, so as to obtain the different feature data in the data of the device. impact on the survival risk.

For a specific description of the foregoing optional manners, reference may be made to the foregoing method embodiments, and details are not repeated here. In addition, the explanation and the description of the beneficial effects of any of the apparatus 100 for predicting the survival risk rate provided above may refer to the above corresponding method embodiments, and details are not repeated here.

As an example, with reference to FIG. 2 , the function realized by the acquisition unit 101 in the apparatus 100 for predicting survival risk rate can be realized through the communication interface 24 in FIG. 2 , and the functions realized by the processing unit 102, the determination unit 103 and the interpretation unit 104 can be realized through The processor 11 in FIG. 2 executes the program code in the main memory 22 in FIG. 2 to realize.

Fig. 11 shows a schematic structural diagram of a signal-carrying medium for carrying a computer program product provided by an embodiment of the present application. The signal-carrying medium is used for storing a computer program product or a computer program for executing a computer process on a computing device.

As shown in FIG. 11 , signal-bearing medium 110 may include one or more program instructions that, when executed by one or more processors, may provide the functions or portions of the functions described above with respect to FIG. 6 . Thus, for example, one or more features referred to in S201 - S202 in FIG. 6 may be undertaken by one or more instructions associated with the signal bearing medium 110 . Additionally, the program instructions in FIG. 11 also describe example instructions.

In some examples, signal bearing medium 110 may comprise computer readable medium 111 such as, but not limited to, a hard drive, compact disc (CD), digital video disc (DVD), digital tape, memory, read-only memory (read only memory) -only memory, ROM) or random access memory (random access memory, RAM) and so on.

In some implementations, signal bearing media 110 may comprise computer recordable media 112 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like.

In some implementations, signal bearing medium 110 may include communication media 113 such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).

The signal bearing medium 110 may be conveyed by a wireless form of communication medium 113 (eg, a wireless communication medium conforming to the IEEE 1902.11 standard or other transmission protocol). One or more program instructions may be, for example, computer-executable instructions or logic-implementing instructions.

In some examples, an apparatus for predicting survival risk, such as that described with respect to FIG. Instructions provide various operations, functions, or actions.

It should be understood that the arrangements described herein are for example purposes only. Accordingly, those skilled in the art will appreciate that other arrangements and other elements (e.g., machines, interfaces, functions, sequences, and groups of functions, etc.) can be used instead, and some elements may be omitted altogether depending on the desired result. . In addition, many of the described elements are functional entities that may be implemented as discrete or distributed components, or implemented in conjunction with other components in any suitable combination and location.

In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions according to the embodiments of the present application are generated in whole or in part when the computer executes the instructions on the computer. A computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or may contain one or more data storage devices such as servers and data centers that can be integrated with the medium. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (solid state disk, SSD)), etc.

The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

A method for predicting survival risk, characterized in that it comprises:

Obtain the data of the sample to be predicted;

The data of the sample to be predicted is input into a preset model, and the data of the sample to be predicted is processed through the preset model to obtain the survival risk rate HR of the sample to be predicted; the survival risk rate is used for Indicates the survival risk of the sample to be predicted;

Wherein, the preset model includes a gating network and a plurality of expert networks, the gating network is used to determine the weight coefficient corresponding to each expert network according to the data of the sample to be predicted, and the survival risk rate is based on each The weight coefficient corresponding to each expert network is the result obtained by weighting and summing the output values of the plurality of expert networks.
The method according to claim 1, further comprising:

A risk function of the sample to be predicted is determined based on the survival risk rate and a reference risk function, and the risk function is used to indicate the survival rate of the sample to be predicted at different times.
The method according to claim 1 or 2, wherein any expert network in the plurality of expert networks comprises at least one candidate residual fully connected neural network RFCN, and the output value of any one expert network is the Among the output values of the at least one candidate RFCN, an output value that satisfies a preset condition.
The method according to any one of claims 1-3, wherein the data of the sample to be predicted includes non-Euclidean data.
The method according to any one of claims 1-4, wherein the method further comprises:

Based on the data of the sample to be predicted and the survival risk rate of the sample to be predicted, the preset model is interpreted to obtain the influence of different feature data in the data of the sample to be predicted on the survival risk rate.
The method according to claim 5, wherein when the data of the sample to be predicted is patient case data, based on the data of the sample to be predicted and the survival risk rate of the sample to be predicted, the The preset model is explained to obtain the influence of different characteristic data in the data of the sample to be predicted on the survival risk rate, including:

Based on the patient's case data and the patient's survival risk rate, the preset model is explained to obtain the influence of different feature data in the patient's case data on the patient's survival risk rate.
The method according to claim 5, wherein when the sample to be predicted is the data of equipment, based on the data of the sample to be predicted and the survival risk rate of the sample to be predicted, the preset model Explain to obtain the influence of different characteristic data in the data of the sample to be predicted on the survival risk rate, including:

Based on the data of the device and the survival risk rate of the device, the preset model is interpreted to obtain the influence of different feature data in the data of the device on the survival risk rate of the device.
The method according to any one of claims 1-7, further comprising:

The initial model is trained by using the data of the training samples to obtain the preset model; wherein, the initial model includes an initial gating network and a plurality of initial expert networks.
The method according to claim 8, wherein the training of the initial model using the data of the training samples comprises:

inputting data of the training samples into the initial gating network and the plurality of initial expert networks in the initial model;

Obtain the weight coefficient of each initial expert network according to the initial gating network, and weight and sum the output values of the multiple initial expert networks according to the weight coefficient corresponding to each initial expert network to obtain the prediction of the training sample Survival hazard rate;

determining a loss function based on the predicted survival hazard rate of the training samples and the survival data of the training samples;

Network parameters of the initial gating network and the plurality of initial expert networks are adjusted based on the loss function.
A device for predicting survival risk, characterized by comprising:

An acquisition unit, configured to acquire data of samples to be predicted;

a processing unit, configured to input the data of the sample to be predicted into a preset model, and process the data of the sample to be predicted through the preset model to obtain the survival risk rate HR of the sample to be predicted; The survival risk rate is used to represent the survival risk of the sample to be predicted;

Wherein, the preset model includes a gating network and a plurality of expert networks, the gating network is used to determine the weight coefficient corresponding to each expert network according to the data of the sample to be predicted, and the survival risk rate is based on each The weight coefficient corresponding to each expert network is the result obtained by weighting and summing the output values of the plurality of expert networks.
The device according to claim 10, further comprising:

The determining unit is configured to determine a risk function of the sample to be predicted based on the survival risk rate and a reference risk function, and the risk function is used to indicate the survival rate of the sample to be predicted at different times.
The device according to claim 10 or 11, wherein any expert network in the plurality of expert networks includes at least one candidate residual fully connected neural network RFCN, and the output value of any one expert network is the Among the output values of the at least one candidate RFCN, an output value that satisfies a preset condition.
The device according to any one of claims 10-12, wherein the data of the sample to be predicted includes non-Euclidean data.
The device according to any one of claims 10-13, wherein the device further comprises:

An interpretation unit, configured to explain the preset model based on the data of the sample to be predicted and the survival risk rate of the sample to be predicted, so as to obtain the impact of different feature data in the data of the sample to be predicted on the survival rate impact on risk.
The device according to claim 14, wherein when the data of the sample to be predicted is patient case data, the interpretation unit is specifically used for:

Based on the patient's case data and the patient's survival risk rate, the preset model is interpreted to obtain the impact of different feature data in the patient's case data on the patient's survival risk rate.
The device according to claim 14, wherein when the sample to be predicted is device data, the interpretation unit is specifically used for:

Based on the data of the device and the survival risk rate of the device, the preset model is interpreted to obtain the influence of different feature data in the data of the device on the survival risk rate of the device.
A device for predicting survival risk, characterized by comprising: one or more processors and memory, the one or more processors are configured to invoke program instructions stored in the memory to execute the The method described in any one of claims 1-9.
A computer-readable storage medium, characterized in that the computer-readable storage medium includes program instructions, and when the program instructions are run on a computer or a processor, the computer or the processor executes claim 1 - the method described in any one of 9.
A computer program product, characterized in that when the computer program product is run on a device for predicting survival risk, the device is made to execute the method according to any one of claims 1-9.