WO2021190661A1 - 一种数据处理系统、方法、装置及存储介质 - Google Patents

一种数据处理系统、方法、装置及存储介质 Download PDF

Info

Publication number
WO2021190661A1
WO2021190661A1 PCT/CN2021/084227 CN2021084227W WO2021190661A1 WO 2021190661 A1 WO2021190661 A1 WO 2021190661A1 CN 2021084227 W CN2021084227 W CN 2021084227W WO 2021190661 A1 WO2021190661 A1 WO 2021190661A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
target disease
feature
neural network
memory cell
Prior art date
Application number
PCT/CN2021/084227
Other languages
English (en)
French (fr)
Inventor
贾文笑
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021190661A1 publication Critical patent/WO2021190661A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Definitions

  • This application relates to the field of data analysis technology, and is specifically applied to the field of medical technology, and specifically relates to a data processing system, a data processing method, a data processing device, a terminal device, and a computer-readable storage medium.
  • the inventor realized that in the field of medical technology, patients suffering from a certain disease (that is, the target disease) may still recur after being cured. If the patient does not prevent it in advance, the probability of recurrence will increase. It is very important to accurately predict the recurrence of the target disease. It can provide clinicians with decision support, prevent in advance and give targeted treatment plans.
  • a certain disease that is, the target disease
  • the embodiments of the present application provide a data processing system, method, device, and storage medium, which can improve the accuracy of the prediction result of determining the recurrence of the target disease in the target patient.
  • the first aspect of the embodiments of the present application provides a data processing system, where the system includes a storage device and a terminal device, wherein:
  • the storage device is configured to store each time series data set associated with at least one patient and the target disease, and the time series data set of any patient includes disease statistics data associated with the target disease at each time node of the any patient;
  • the terminal device is configured to obtain a target time series data set of a target patient from the time series data sets stored in the storage device, the target time series data set including the target patient and the target disease at n time nodes Associated n target disease statistical data, n is an integer greater than 1; wherein, the t-th time node in the n time nodes corresponds to the t-th target disease statistical data in the n target disease statistics data x t , t ⁇ [1,n];
  • the terminal device is further configured to input the target time series data set into a target disease recurrence prediction model, the target disease recurrence prediction model includes n feature processing modules, and the t-th feature processing among the n feature processing modules The module is used to perform feature extraction on the t-th target disease statistical data x t to obtain the t-th feature information h t ;
  • the terminal device is also used to call the m-th feature processing module in the target disease recurrence prediction model to analyze the target disease statistical data x m input to the m-th feature processing module and the m-1th feature processing module output m-1 feature information h m-1 carries out word embedding, and determines and outputs the m- th feature information h m , m ⁇ [2, n] based on the word embedding result; according to the n-th feature processing module among the n feature processing modules The output nth characteristic information h n determines the prediction result of the target patient recurring the target disease; wherein the first characteristic information h 1 is determined according to the first target disease statistical data x 1 .
  • the second aspect of the embodiments of the present application provides a data processing method, including:
  • n target disease statistical data associated with the target disease at n time nodes of the target patient, and n is an integer greater than 1; wherein, the The t-th time node among the n time nodes corresponds to the t-th target disease statistical data among the n target disease statistical data x t , t ⁇ [1, n];
  • the target time series data set is input into a target disease recurrence prediction model.
  • the target disease recurrence prediction model includes n feature processing modules. Perform feature extraction on the disease statistical data x t to obtain the t-th feature information h t ; call the m-th feature processing module in the target disease recurrence prediction model to input the target disease statistical data x m and the m-th feature processing module into the m-th feature processing module
  • the m-1 feature information h m-1 output by the m-1 feature processing module performs word embedding, and determines and outputs the m feature information h m , m ⁇ [2, n] based on the word embedding result; according to the n
  • the nth feature information h n output by the nth feature processing module in the feature processing module determines the prediction result of the target patient recurring the target disease; wherein the first feature information h 1 is based on the first target disease statistical data x 1 definite.
  • a third aspect of the embodiments of the present application provides a data processing device, including:
  • the t-th time node in the n time nodes corresponds to the t-th target disease statistical data in the n target disease statistical data x t , t ⁇ [1, n];
  • a processing module for inputting the target time series data set into a target disease recurrence prediction model includes n feature processing modules, and the t-th feature processing module of the n feature processing modules is used for Perform feature extraction on the t-th target disease statistical data x t to obtain the t-th feature information h t ;
  • the processing module is further configured to call the m-th feature processing module in the target disease recurrence prediction model to analyze the target disease statistical data x m input to the m-th feature processing module and the m-1th feature processing module output m-1 feature information h m-1 performs word embedding, and determines and outputs the m- th feature information h m based on the word embedding result, m ⁇ [2, n];
  • the processing module is further configured to determine the prediction result of the target patient recurring the target disease according to the nth characteristic information h n output by the nth characteristic processing module of the n characteristic processing modules; wherein, the first characteristic information h 1 is determined based on the first target disease statistical data x 1 .
  • the fourth aspect of the embodiments of the present application provides a terminal device, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program, the computer program includes a program, and the processing The device is configured to call the program and execute the following methods:
  • n target disease statistical data associated with the target disease at n time nodes of the target patient, and n is an integer greater than 1; wherein, the The t-th time node among the n time nodes corresponds to the t-th target disease statistical data among the n target disease statistical data x t , t ⁇ [1, n];
  • the target time series data set is input into a target disease recurrence prediction model.
  • the target disease recurrence prediction model includes n feature processing modules. Perform feature extraction on the disease statistical data x t to obtain the t-th feature information h t ; call the m-th feature processing module in the target disease recurrence prediction model to input the target disease statistical data x m and the m-th feature processing module into the m-th feature processing module
  • the m-1 feature information h m-1 output by the m-1 feature processing module performs word embedding, and determines and outputs the m feature information h m , m ⁇ [2, n] based on the word embedding result; according to the n
  • the nth feature information h n output by the nth feature processing module in the feature processing module determines the prediction result of the target patient recurring the target disease; wherein the first feature information h 1 is based on the first target disease statistical data x 1 definite.
  • the fifth aspect of the embodiments of the present application provides a computer-readable storage medium, the readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processing
  • the device executes the following methods:
  • n target disease statistical data associated with the target disease at n time nodes of the target patient, and n is an integer greater than 1; wherein, the The t-th time node among the n time nodes corresponds to the t-th target disease statistical data among the n target disease statistical data x t , t ⁇ [1, n];
  • the target time series data set is input into a target disease recurrence prediction model.
  • the target disease recurrence prediction model includes n feature processing modules. Perform feature extraction on the disease statistical data x t to obtain the t-th feature information h t ; call the m-th feature processing module in the target disease recurrence prediction model to input the target disease statistical data x m and the m-th feature processing module into the m-th feature processing module
  • the m-1 feature information h m-1 output by the m-1 feature processing module performs word embedding, and determines and outputs the m feature information h m , m ⁇ [2, n] based on the word embedding result; according to the n
  • the nth feature information h n output by the nth feature processing module in the feature processing module determines the prediction result of the target patient recurring the target disease; wherein the first feature information h 1 is based on the first target disease statistical data x 1 definite.
  • the target disease recurrence prediction model focuses on the before-and-after connection between the disease statistical data of each time node in the target time series data set, which is beneficial to improve the accuracy of the prediction result of determining the target patient's recurrence of the target disease.
  • FIG. 1 is a schematic diagram of the architecture of a data processing system provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • Fig. 3 is a schematic structural diagram of a target disease recurrence prediction model provided by an embodiment of the present application.
  • Figure 4 is a schematic structural diagram of another target disease recurrence prediction model provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of another target disease recurrence prediction model provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the technical solution of the present application may involve the field of artificial intelligence and/or big data technology.
  • it may specifically involve neural network technology and can be applied to scenarios such as smart medical care to realize digital medical care and push the construction of smart cities.
  • the data involved in this application such as time series data, feature information, and/or prediction results, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.
  • the implementation of this application proposes a data processing system.
  • the system includes a storage device and a terminal device.
  • the terminal device here can be any of the following: a smart phone, a tablet computer, a laptop computer, and other portable devices, And desktop computers, and so on. in:
  • the storage device is used to store each time series data set associated with at least one patient and the target disease, and the time series data set of any patient includes disease statistical data associated with the target disease at each time node of any patient, at least one patient and Target historical medical data associated with the target disease.
  • any of the above-mentioned patients is a patient who has suffered or is suffering from the target disease, and the disease statistics of any patient may include laboratory examinations and demographic characteristics of any patient, as well as some temporal factors.
  • the target disease can be disease 1, disease 2 (for example, myocardial infarction), disease 3 (for example, heart failure), etc.
  • the disease statistics associated with disease 1 may include: the blood hemoglobin content of the patient and the interval from the date of physical examination to the remission time of disease 1, IgG staining intensity, blood albumin, urine protein quantification, disease 1 remission type, and so on.
  • the storage device may refer to the server corresponding to the patient monitoring system, or the node device in the blockchain network.
  • the historical medical data generated by each patient's visit, diagnosis and treatment, laboratory inspection, and surgery related to the target disease can be uploaded to the patient monitoring system, and the patient monitoring system can record the information of each patient at each time node.
  • the disease statistical data associated with the target disease constitutes a time series data set, and the time series data set of each patient is stored in the server.
  • the patient monitoring system can send the disease statistical data of each patient associated with the target disease at each time node to the node device in the blockchain network after receiving the disease statistical data associated with the target disease at each time node.
  • the node device can form a time series data set of disease statistical data associated with each patient at each time node and the target disease, and write the time series data set of each patient into the blockchain.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the terminal device is used to obtain the target time series data set of the target patient from each time series data set stored in the storage device.
  • the target time series data set includes n target disease statistical data associated with the target disease at n time nodes, and n is greater than An integer of 1; where the t-th time node in the n time nodes corresponds to the t-th target disease statistics data in the n target disease statistics data x t , t ⁇ [1, n].
  • the terminal device is also used to input the target time series data set into the target disease recurrence prediction model.
  • the target disease recurrence prediction model includes n feature processing modules. Feature extraction is performed on the disease statistical data x t to obtain the t-th feature information h t .
  • the terminal device is also used to call the m-th feature processing module in the target disease recurrence prediction model to input the target disease statistical data x m of the m-th feature processing module and the m-1th feature information h output by the m-1th feature processing module m-1 carries out word embedding, and determines and outputs the m- th feature information h m based on the word embedding result, m ⁇ [2, n]; according to the n-th feature information h n output by the n-th feature processing module among the n feature processing modules Determine the prediction result of the target patient's recurrence of the target disease; wherein the first feature information h 1 is determined according to the first target disease statistical data x 1 .
  • the n feature processing modules include n cyclic neural network cell masks and n hidden layers, the t-th cyclic neural network cell mask of the n cyclic neural network cell masks and n hidden layers
  • word embedding is performed on the target disease statistical data x t input to the t-th cyclic neural network cell mask group and the t-1th feature information h t-1 output by the t-1th hidden layer.
  • Update t-th memory cell Obtain the t-th target memory cell C t , and input the t-th target memory cell C t into the t-th hidden layer corresponding to the t-th cyclic neural network cell mask group;
  • the terminal device is specifically used to call the t-th cyclic neural network cell mask group to update the t-th memory cell according to the forgetting gate function and the input gate function Obtain the t-th target memory cell C t .
  • the terminal device is specifically configured to call the t-th hidden layer to perform feature compression on the t-th target memory cell C t input by the t-th target memory cell C t according to an output gate function, to obtain and output The t-th feature information h t .
  • the target disease recurrence prediction model further includes an attention mechanism layer, a terminal device, and is also used to analyze the t-th feature information h t input to the attention mechanism layer through the attention mechanism layer, based on the analysis result for the first t characteristic information H t generates the t-th target weights w t, the t-th characteristic information H t is updated to h t * w t, and h t * w t input + 1 recurrent neural network cells masking group t; call The t+1-th cyclic neural network cell mask group performs word embedding on the target disease statistics x t+1 and the t-th feature information h t *w t input into the t+1-th cyclic neural network cell mask group to obtain the t+1-th The t+1th memory cell corresponding to the cyclic neural network cell mask group
  • the prediction result of the target patient's recurrence of the target disease includes the probability of the target patient's recurrence of the target disease at each time node.
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • the data processing method can be executed by the terminal device in the above-mentioned data processing system.
  • the data processing method includes the following steps:
  • the target time series data set includes n target disease statistical data associated with the target disease at n time nodes of the target patient, and n is an integer greater than 1.
  • the t-th time node among the n time nodes corresponds to the t-th target disease statistical data among the n target disease statistics data x t , t ⁇ [1, n].
  • the target disease can be heart failure, myocardial infarction, sepsis, etc.
  • the terminal device can interact with the storage device, and the storage device can refer to the server corresponding to the monitoring system or the node in the blockchain network.
  • the storage device pre-stores at least one patient and each time series data set associated with the target disease, and the time series data set of any patient includes disease statistical data associated with the target disease at each time node of the any patient.
  • the terminal device runs an application program corresponding to the disease recurrence prediction platform, or can open a web page corresponding to the disease recurrence prediction platform.
  • Any user can log in to the disease recurrence prediction platform and submit a disease recurrence prediction request for the target patient.
  • the disease recurrence prediction request includes the identity information of the target patient and the identification information of the target disease.
  • the identity information of the target patient may refer to the document number of the target patient, or a medical record number that can uniquely identify the target patient, etc.
  • the terminal device After the terminal device detects the above-mentioned disease recurrence prediction request submitted by the user, it can obtain the target from each time series data set of at least one patient and the target disease stored in the storage device based on the above-mentioned identity information and the identification information of the target disease.
  • the patient's target time series data set After the terminal device detects the above-mentioned disease recurrence prediction request submitted by the user, it can obtain the target from each time series data set of at least one patient and the target disease stored in the storage device based on the above-mentioned identity information and the identification information of the target disease.
  • the patient's target time series data set The patient's target time series data set.
  • each time series data set associated with at least one patient and the target disease can be written into the blockchain in advance, and subsequently, the terminal device can obtain the target from the blockchain through the node device in the blockchain network.
  • the patient's target time series data set can be written into the blockchain in advance, and subsequently, the terminal device can obtain the target from the blockchain through the node device in the blockchain network. The patient's target time series data set.
  • the target disease recurrence prediction model includes n feature processing modules, and the t-th feature processing module of the n feature processing modules is used to calculate the t-th target disease statistical data x Perform feature extraction at t to obtain the t-th feature information h t .
  • the network structure diagram of the target disease recurrence prediction model may be as shown in FIG. 3.
  • the target disease recurrence prediction model includes n feature processing modules, and the t-th feature processing module of the n feature processing modules is used for Perform feature extraction on t target disease statistical data x t to obtain t-th feature information h t .
  • the input of the target disease recurrence prediction model is a time series data, and the output prediction result can also be a time series data.
  • the prediction result can be a time series sequence [y1, y2, y3...y k ] to represent the patient’s goal at each time node.
  • time series sample data sets of a large number of sample patients can be collected.
  • the time series sample data set of any sample patient includes the data set of any sample patient associated with the target disease at each time node.
  • Disease statistical sample data (the included data types are the same as the above-mentioned disease statistical data, but the specific content or values are different for different patients).
  • the time series sample data set of each sample patient can be used as a training sample, and a label can be added to each training sample.
  • the embodiment of the present application retains two variables for the prediction target of each individual (ie, patient).
  • One variable time represents the time of follow-up censorship
  • the other variable event represents whether the event occurred (that is, whether the target disease recurred), the recurring event is 1, and the non-recurring event is 0.
  • the maximum follow-up time is 26 years
  • this embodiment of the application can set the predicted time series to 27 time points, and each time point represents 1 year and 27 years.
  • the label can be T(e), where T represents the year, and in the case of recurrence, e is 1, no recurrence or follow-up Censoring e is 0.
  • the model is subsequently trained based on such training samples, and the output of the model will be a time series output (ie, y1, y2...y k in Figure 3).
  • follow-up censorship refers to: patients suffering from the target disease will be followed up regularly after the target disease is cured. The doctor can understand the health of the patient based on the follow-up records, but some may be followed up for several years (for example, 3 years) and after In the past few years, the target disease has not relapsed, and there will be no data records in the following years. Due to the presence of follow-up censored sample patients, it is actually impossible to judge whether the target disease recurs. Currently, the sample patients with follow-up censorship cannot be used as training samples to train the target disease recurrence prediction model. However, in this solution, two variables can be used to add annotated labels, and even the sample patients who are censored during follow-up can be used as training samples, which is beneficial to deal with the censorship problem and reduce the difficulty of sample selection.
  • the target disease recurrence prediction model can be output The prediction result of the target patient's recurrence of the target disease.
  • the preset time node is based on the current time as the starting point, the first year in the future, the second year in the future, the third year in the future, the fourth year in the future, and the fifth year in the future.
  • the current time is 2020
  • the preset time nodes are: 2021, 2022, 2023, 2024, and 2025.
  • the above time series characterization the first year in the future (i.e. 2021) will not recur (i.e.
  • the probability is 0
  • the next year i.e. 2022
  • the probability is 0
  • the third year in the future i.e. 2023 Years
  • will not recur that is, with a probability of 0
  • will not recur in the next four years that is, in 2024
  • will recur in the next five years that is, in 2025 (that is, with a probability of 1).
  • the prediction may also be the result of a timing sequence [y1, y2, y3 ... y k] at each time a predetermined node y a (a ⁇ [1, k ]) are summed to give a total probability of recurrence S(k).
  • the n feature processing modules include n cyclic neural network cell masks and n hidden layers, the t-th cyclic neural network cell mask of the n cyclic neural network cell masks and the n
  • the t-th hidden layer in the hidden layer has a one-to-one correspondence
  • the cyclic neural network cell mask may be, for example, an LSTM (Long Short-Term Memory) cell mask.
  • any m-th feature processing module (that is, n-1 feature processing modules other than the first feature processing module among the n feature processing modules) performs word embedding, determines and outputs the
  • the method of m feature information h m is: through the t-th cyclic neural network cell mask group to input the target disease statistics x t of the t-th cyclic neural network cell mask group and the t-1th feature output from the t-1th hidden layer
  • the information h t-1 is used for word embedding, and the t-th memory cell corresponding to the t-th cyclic neural network cell mask group is obtained
  • Update t-th memory cell Obtain the t-th target memory cell C t , and input the t-th target memory cell C t into the t-th hidden layer corresponding to the t-th cyclic neural network cell mask group.
  • feature compression is performed on the t-th target memory cell C t through the t-th hidden layer to obtain and
  • the above-mentioned word embedding is performed on the target disease statistical data x t inputted into the t-th cyclic neural network cell mask group and the t-1th feature information h t-1 output by the t-1th hidden layer to obtain the t-th cyclic neural network T-th memory cell corresponding to the cell mask group It can be calculated according to formula 1.1.
  • w j and u j are both control gate functions, which are used to control the proportion of the corresponding information retained.
  • w j is used to control the proportion of the t-1th characteristic information h t-1
  • u j is used to control the retained input
  • b j is a bias matrix. After the model training is completed, the bias matrix is determined.
  • update the t-th memory cell The specific way to obtain the t-th target memory cell C t is: call the t-th cyclic neural network cell mask group to update the t-th memory cell according to the forget gate function and the input gate function Obtain the t-th target memory cell C t .
  • f t is forgotten gate function
  • i t is a function of the input gate
  • C t-1 t-1 for the first target memory cells i.e., for purposes of circulating cells in the neural network group mask t, circulating cells in the neural network group mask t
  • the t-1th target memory cell obtained from the previous cyclic neural network cell mask group (the t-1th cyclic neural network cell mask group).
  • Equation 1.3 and Equation 1.4 ⁇ represents the sigmoid activation function, and all w and b represent the corresponding weight matrix and bias.
  • the terminal device may call the t-th hidden layer to perform feature compression on the t-th target memory cell C t inputted by the t-th target memory cell C t according to the output gate function, to obtain and output the t-th feature Information h t .
  • Equation 1.5 the specific process of the feature compression described above can be performed according to Equation 1.5 and Equation 1.6, where f o is the output gate, and h t is the output of the t-th hidden layer.
  • the network structure diagram of the target disease recurrence prediction model may be as shown in FIG. 4.
  • the target time series data set includes 3 target disease statistics data associated with the target disease at the 3 time nodes of the target patient, namely: x 1 , x 2 and x 3
  • x 1 can be entered in the first An LSTM cell mask group, input x 2 into the second LSTM cell mask group, and input x 3 into the third LSTM cell mask group.
  • the first LSTM cell mask group can extract the characteristic information of x 1 and input the characteristic information into the first LSTM cell mask group.
  • the first hidden layer can perform feature compression on the feature information to obtain the first feature information h 1 , and input the first feature information h 1 into the second LSTM cell mask group, and the second LSTM cell mask group can compare h 1 Perform word embedding with the input x 2 to obtain the second memory cell corresponding to the second LSTM cell mask group And renew the second memory cell Obtain the second target memory cell C 2 , and input the second target memory cell C 2 into the second hidden layer corresponding to the second cyclic neural network cell mask group. Further, the second hidden layer performs feature compression on the second target memory cell C 2 to obtain second feature information h 2 , and output the feature information h 2 to the third LSTM cell mask.
  • the third LSTM cell mask group can perform word embedding on the input x 3 and the second feature information h 2 to obtain the third memory cell And renew the third memory cell Obtain the third target memory cell C 3 , and input the third target memory cell C 3 into the third hidden layer corresponding to the third cyclic neural network cell mask group.
  • the third hidden layer features the third target memory cell C 3 Compressed, the third characteristic information h 3 is obtained .
  • the target disease recurrence prediction model also includes a fully connected layer, and the terminal device can input the nth feature information h n output by the nth feature processing module in the n feature processing modules into the fully connected layer, and the fully connected layer n Feature information h n Perform data analysis to determine the prediction result of the target patient's recurrence of the target disease.
  • the above time series characterization the first year in the future (i.e. 2021) will not recur (i.e. the probability is 0), the next year (i.e. 2022) will not recur (i.e. the probability is 0), the third year in the future (i.e. 2023) Years) will not recur (that is, with a probability of 0), will not recur in the next four years (that is, in 2024) (that is, with a probability of 0), and will recur in the next five years (that is, in 2025) (that is, with a probability of 1).
  • the target disease recurrence prediction model further includes an attention mechanism layer
  • the terminal device can analyze the t-th feature information h t input to the attention mechanism layer through the attention mechanism layer, and determine the t-th feature based on the analysis result information H t of t target weight W t corresponding to the t-th characteristic information H t is updated to h t * w t, and h t * w t input + 1 recurrent neural network cells masking group t; call the t + 1
  • the cyclic neural network cell mask group performs word embedding on the target disease statistical data x t+1 and the t-th feature information h t *w t input into the t+1th cyclic neural network cell mask group to obtain the t+1th cyclic neural network
  • the t+1th memory cell corresponding to the cell mask group Exemplarily, the target disease recurrence prediction model after adding the attention mechanism layer may be as shown in FIG. 5.
  • the attention mechanism layer can assign different weights to the feature information at different time nodes from the time dimension, for example, assign a weight w 1 to h 1 at the first time node , H 2 under the second time node is given a weight w 2 , and so on, h n under the Nth time node is given a weight w n .
  • the characteristic information h t obtained based on the t-th target disease statistical data x t at the t-th time node can be understood as the characteristic information h t under the t-th time node.
  • the weights of the above-mentioned different time nodes are predetermined after the model training is completed.
  • the attention mechanism layer can assign different weights to the feature information at different time nodes from the variable dimension, that is, during the training process, the attention mechanism layer has learned what types of variables are assigned
  • the attention mechanism layer can determine the t- th target weight w t corresponding to the t- th feature information h t based on the analysis result.
  • the terminal device can obtain the weight assigned to the characteristic information h t of each time node from the attention mechanism layer, so as to determine that the disease statistics of each time node are relevant to the prediction result Increase the interpretability of the model’s output results.
  • the target time series data set can be input to the target disease recurrence prediction model, and the m-th feature processing module in the target disease recurrence prediction model is called to input the target disease statistical data x m and m-th to the m-th feature processing module.
  • the m-1th feature information h m-1 output by the feature processing module is used for word embedding, and the m-th feature information h m , m ⁇ [2, n] is determined and output based on the word embedding result.
  • the target may determine the target disease relapse predictor characteristic according to the n n-th process module wherein the processing module outputs the n characteristic information h n. In this way, the target disease recurrence prediction model pays attention to the relationship between the disease statistics at each time node in the target time series data set, which is beneficial to improve the accuracy of the prediction result of determining the target disease recurrence of the target patient.
  • FIG. 6 is a schematic structural diagram of a data processing apparatus provided by an embodiment of this application.
  • the data processing device described in this embodiment can be configured in a terminal device and includes:
  • the acquiring module 60 is configured to acquire a target time series data set of a target patient, the target time series data set including n target disease statistical data associated with the target disease at n time nodes of the target patient, and n is greater than 1. Integer; wherein, the t-th time node in the n time nodes corresponds to the t-th target disease statistical data among the n target disease statistics data x t , t ⁇ [1, n];
  • the processing module 61 is configured to input the target time series data set into a target disease recurrence prediction model, the target disease recurrence prediction model includes n feature processing modules, and the t-th feature processing module of the n feature processing modules is used Perform feature extraction on the t-th target disease statistical data x t to obtain the t-th feature information h t ;
  • the processing module 61 is further configured to call the m-th feature processing module in the target disease recurrence prediction model to input the target disease statistical data x m to the m-th feature processing module and the output of the m-1th feature processing module Perform word embedding on the m-1th feature information h m-1 , and determine and output the mth feature information h m based on the word embedding result, m ⁇ [2, n];
  • the processing module 61 is further configured to determine the prediction result of the target patient recurring the target disease according to the nth characteristic information h n output by the nth characteristic processing module of the n characteristic processing modules; wherein, the first characteristic The information h 1 is determined based on the first target disease statistical data x 1 .
  • the target disease statistical data x t input to the t-th cyclic neural network cell mask group and the t-1th feature information h t-1 output by the t-1th hidden layer Perform word embedding to obtain the t-th memory cell corresponding to the t-th cyclic neural network cell mask group
  • Update the t-th memory cell Obtain the t-th target memory cell C t , and input the t-th target memory cell C t into the t-th hidden layer corresponding to the t-th cyclic neural network cell mask group;
  • the processing module 61 is specifically configured to call the t-th cyclic neural network cell mask group to update the t-th memory cell according to the forget gate function and the input gate function Obtain the t-th target memory cell C t .
  • the processing module 61 is specifically configured to call the t-th hidden layer to perform feature compression on the t-th target memory cell C t input by the t-th target memory cell C t according to an output gate function, Obtain and output the t-th feature information h t .
  • the target disease recurrence prediction model further includes an attention mechanism layer
  • the processing module 61 is further configured to perform processing on the t-th feature information h t input to the attention mechanism layer through the attention mechanism layer.
  • Analyze generate the t-th target weight w t for the t-th feature information h t based on the analysis result, update the t-th feature information h t to h t *w t , and input h t *w t into the t+1-th cyclic neural network Cell mask group; call the t+1th cyclic neural network cell mask group to input the target disease statistics x t+1 and t feature information h t *w t of the t+1th cyclic neural network cell mask group for word embedding, Get the t+1th memory cell corresponding to the t+1th cyclic neural network cell mask group
  • the prediction result of the target patient's recurrence of the target disease includes the probability of the target patient's recurrence of the target disease at the respective preset time nodes.
  • each functional module of the data processing apparatus of this embodiment can be implemented according to the method in Figure 2 of the above method embodiment, and the specific implementation process can refer to the related description of Figure 2 of the above method embodiment, which will not be omitted here. Go into details.
  • the data processing device may input the target time series data set into the target disease recurrence prediction model, and call the m-th feature processing module in the target disease recurrence prediction model to input the target disease statistics x m and the m-th feature processing module.
  • the m-1th feature information h m-1 output by the m-1th feature processing module performs word embedding, and determines and outputs the m- th feature information h m , m ⁇ [2, n] based on the word embedding result.
  • the target may determine the target disease relapse predictor characteristic according to the n n-th process module wherein the processing module outputs the n characteristic information h n. In this way, the target disease recurrence prediction model pays attention to the relationship between the disease statistics at each time node in the target time series data set, which is beneficial to improve the accuracy of the prediction result of determining the target disease recurrence of the target patient.
  • FIG. 7 is a schematic structural diagram of a terminal device according to an embodiment of the application.
  • the terminal device may include a processor and a memory.
  • the terminal device may also include an output device.
  • the terminal device may include: one or more processors 701; one or more output devices 702 and a memory 703.
  • the aforementioned processor 701, output device 702, and memory 703 are connected by a bus.
  • the memory 703 is configured to store a computer program, and the computer program includes program instructions, and the processor 701 is configured to execute the program instructions stored in the memory 703, and perform the following operations:
  • n target disease statistical data associated with the target disease at n time nodes of the target patient, and n is an integer greater than 1; wherein, the The t-th time node among the n time nodes corresponds to the t-th target disease statistical data among the n target disease statistical data x t , t ⁇ [1, n];
  • the target time series data set is input into a target disease recurrence prediction model.
  • the target disease recurrence prediction model includes n feature processing modules. Perform feature extraction on the disease statistical data x t to obtain the t-th feature information h t ; call the m-th feature processing module in the target disease recurrence prediction model to input the target disease statistical data x m and the m-th feature processing module into the m-th feature processing module
  • the m-1 feature information h m-1 output by the m-1 feature processing module performs word embedding, and determines and outputs the m feature information h m based on the word embedding result, m ⁇ [2, n];
  • the target disease statistical data x t input to the t-th cyclic neural network cell mask group and the t-1th feature information h t-1 output by the t-1th hidden layer Perform word embedding to obtain the t-th memory cell corresponding to the t-th cyclic neural network cell mask group
  • Update the t-th memory cell Obtain the t-th target memory cell C t , and input the t-th target memory cell C t into the t-th hidden layer corresponding to the t-th cyclic neural network cell mask group;
  • the processor 701 is specifically configured to call the t-th cyclic neural network cell mask group to update the t-th memory cell according to the forget gate function and the input gate function Obtain the t-th target memory cell C t .
  • the processor 701 is specifically configured to call the t-th hidden layer to perform feature compression on the t-th target memory cell C t input by the t-th target memory cell C t according to an output gate function, Obtain and output the t-th feature information h t .
  • the target disease recurrence prediction model further includes an attention mechanism layer
  • the processor 701 is further configured to perform processing on the t-th feature information h t input to the attention mechanism layer through the attention mechanism layer.
  • Analyze generate the t-th target weight w t for the t-th feature information h t based on the analysis result, update the t-th feature information h t to h t *w t , and input h t *w t into the t+1-th cyclic neural network Cell mask group; call the t+1th cyclic neural network cell mask group to input the target disease statistics x t+1 and t feature information h t *w t of the t+1th cyclic neural network cell mask group for word embedding, Get the t+1th memory cell corresponding to the t+1th cyclic neural network cell mask group
  • the prediction result of the target patient's recurrence of the target disease includes the probability of the target patient's recurrence of the target disease at the respective preset time nodes.
  • the processor 701 may be a central processing unit (CPU), and the processor 701 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs). ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 703 may include a read-only memory and a random access memory, and provides instructions and data to the processor 701. A part of the memory 703 may also include a non-volatile random access memory.
  • the processor 701, output device 702, and memory 703 described in the embodiment of this application can execute the implementation described in the data processing method provided in the embodiment of this application, and can also execute the data described in the embodiment of this application. The implementation of the processing device will not be repeated here.
  • An embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a processor, the above-mentioned data processing can be performed Steps performed in the method embodiment.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes computer program code.
  • the computer program code runs on a computer, the computer executes the steps performed in the above-mentioned data processing method embodiment.
  • the program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store information based on the blockchain node Use the created data, etc.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Medical Informatics (AREA)
  • Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

一种数据处理系统、方法、装置及存储介质,应用于医疗技术领域,该数据处理系统系统包括存储设备和终端设备,存储设备,用于存储至少一个患者与目标疾病关联的各时序数据集;终端设备,用于从存储设备中获取目标患者的目标时序数据集,并通过目标疾病复发预测模型依照该目标时序数据集确定目标患者复发目标疾病的预测结果,有利于提高确定目标患者复发目标疾病的预测结果的准确性。还涉及区块链技术,可将患者的时序数据集写入区块链中。

Description

一种数据处理系统、方法、装置及存储介质
本申请要求于2020年11月4日提交中国专利局、申请号为202011213492.4,发明名称为“一种数据处理系统、方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据分析技术领域,具体应用于医疗技术领域,具体涉及一种数据处理系统、一种数据处理方法、一种数据处理装置、一种终端设备及一种计算机可读存储介质。
背景技术
发明人意识到,在医疗技术领域,患有某种疾病(即目标疾病)的患者在治愈后仍然存在复发可能,若患者不提前预防,复发的概率会增大。准确地预测目标疾病的复发非常重要,可以为临床医生提供决策支持,提前预防和有针对性地给予治疗方案。
可见,如何准确地预测目标患者复发目标疾病的可能性,成为一个亟待解决的问题。
发明内容
本申请实施例提供了一种数据处理系统、方法、装置及存储介质,可以提高确定目标患者复发目标疾病的预测结果的准确性。
本申请实施例第一方面提供了一种数据处理系统,其中,该系统包括存储设备和终端设备,其中:
所述存储设备,用于存储至少一个患者与目标疾病关联的各时序数据集,任一患者的时序数据集包括所述任一患者在各时间节点与所述目标疾病关联的疾病统计数据;
所述终端设备,用于从所述存储设备存储的所述各时序数据集中获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与所述目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
所述终端设备,还用于将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
所述终端设备,还用于调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
本申请实施例第二方面提供了一种数据处理方法,包括:
获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与所述目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t;调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理 模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
本申请实施例第三方面提供了一种数据处理装置,包括:
获取模块,用于获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与所述目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
处理模块,用于将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
所述处理模块,还用于调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];
所述处理模块,还用于依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
本申请实施例第四方面提供了一种终端设备,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序,所述处理器被配置用于调用所述程序,执行以下方法:
获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与所述目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t;调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
本申请实施例第五方面提供了一种计算机可读存储介质,所述可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行以下方法:
获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与所述目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据 x t,t∈[1,n];
将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t;调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
在本申请实施例中,目标疾病复发预测模型关注目标时序数据集中各时间节点的疾病统计数据之间前后的联系,有利于提高确定目标患者复发目标疾病的预测结果的准确性。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。显而易见地,下面描述的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种数据处理系统的架构示意图;
图2是本申请实施例提供的一种数据处理方法的流程示意图;
图3是本申请实施例提供的一种目标疾病复发预测模型的结构示意图;
图4是本申请实施例提供的另一种目标疾病复发预测模型的结构示意图;
图5是本申请实施例提供的又一种目标疾病复发预测模型的结构示意图;
图6是本申请实施例提供的一种数据处理装置的结构示意图;
图7是本申请实施例提供的一种终端设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的技术方案可涉及人工智能和/或大数据技术领域,如可具体涉及神经网络技术,可应用于智慧医疗等场景,以实现数字医疗,推送智慧城市的建设。可选的,本申请涉及的数据如时序数据、特征信息和/或预测结果等可存储于数据库中,或者可以存储于区块链中,本申请不做限定。
请参见图1,本申请实施提出了一种数据处理系统,该系统包括存储设备和终端设备,此处的终端设备可以为以下任一种:智能手机、平板电脑、膝上计算机等便携式设备,以及台式电脑,等等。其中:
存储设备,用于存储至少一个患者与目标疾病关联的各时序数据集,任一患者的时序数据集包括所述任一患者在各时间节点与所述目标疾病关联的疾病统计数据至少一个患者与目标疾病关联的目标历史医疗数据。其中,上述任一患者为曾患有或者正患有目标疾病的患者,该任一患者的疾病统计数据可以包括任一患者的实验室检查和人口统计学特征,也包括一些时序因子。目标疾病可以为疾病1、疾病2(例如心梗)和疾病3(例如心衰)等等,假设目标疾病为疾病1,通过实验结果发现:血液中血红蛋白含量和体检日期到疾病1缓解时间的间隔,IgG染色强度,血白蛋白,尿蛋白定量,疾病1缓解类型都是最比 较重要的特征,对于预测疾病1复发至关重要。因此,与疾病1关联的疾病统计数据可以包括:患者的血液中血红蛋白含量和体检日期到疾病1缓解时间的间隔,IgG染色强度,血白蛋白,尿蛋白定量,疾病1缓解类型等等。
该存储设备可以指患者监护系统对应的服务器,或者,区块链网络中的节点设备。在一个实施例中,每个患者与目标疾病相关的每次就诊、诊疗、实验室检验、手术产生的历史医疗数据均可以上传至患者监护系统,患者监护系统可以将各个患者的在各时间节点与目标疾病关联的疾病统计数据构成一个时序数据集,并将各个患者的时序数据集存储至服务器中。
或者,患者监护系统在接收到各患者在各时间节点与目标疾病关联的疾病统计数据后,可以将各患者在各时间节点与目标疾病关联的疾病统计数据发送至区块链网络中的节点设备,节点设备可以将各患者在各时间节点与目标疾病关联的疾病统计数据构成一个时序数据集,并将各患者的时序数据集写入区块链。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。
终端设备,用于从存储设备存储的各时序数据集中获取目标患者的目标时序数据集,目标时序数据集包括目标患者在n个时间节点与目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,n个时间节点中的第t个时间节点对应n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n]。
终端设备,还用于将目标时序数据集输入目标疾病复发预测模型,目标疾病复发预测模型中包括n个特征处理模块,n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
终端设备,还用于调用目标疾病复发预测模型中的第m特征处理模块对输入第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];依照n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定目标患者复发目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
在一个实施例中,n个特征处理模块包括n个循环神经网络细胞掩组和n个隐含层,n个循环神经网络细胞掩组中的第t循环神经网络细胞掩组和n个隐含层中的第t隐含层一一对应,若t=m,终端设备,还具体用于:
通过第t循环神经网络细胞掩组对输入第t循环神经网络细胞掩组的目标疾病统计数据x t和第t-1隐含层输出的第t-1特征信息h t-1进行词嵌入,得到第t循环神经网络细胞掩组对应的第t记忆细胞
Figure PCTCN2021084227-appb-000001
更新第t记忆细胞
Figure PCTCN2021084227-appb-000002
得到第t目标记忆细胞C t,并将第t目标记忆细胞C t输入第t循环神经网络细胞掩组对应的第t隐含层;
通过第t隐含层对第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
在一个实施例中,终端设备,具体用于调用第t循环神经网络细胞掩组根据遗忘门函数和输入门函数更新第t记忆细胞
Figure PCTCN2021084227-appb-000003
得到第t目标记忆细胞C t
在一个实施例中,终端设备,具体用于调用第t隐含层依照输出门函数对所述第t目标记忆细胞C t输入的所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
在一个实施例中,目标疾病复发预测模型还包括注意力机制层,终端设备,还用于通过注意力机制层对输入所述注意力机制层的第t特征信息h t进行解析,基于解析结果为第t 特征信息h t生成第t目标权重w t,将第t特征信息h t更新为h t*w t,并将h t*w t输入第t+1循环神经网络细胞掩组;调用第t+1循环神经网络细胞掩组对输入第t+1循环神经网络细胞掩组的目标疾病统计数据x t+1和第t特征信息h t*w t进行词嵌入,得到第t+1循环神经网络细胞掩组对应的第t+1记忆细胞
Figure PCTCN2021084227-appb-000004
在一个实施例中,目标患者复发目标疾病的预测结果包括目标患者在各时间节点复发所述目标疾病的概率。
在本申请实施例中,上述终端设备的具体实现可参考下述图2所对应的实施例中相关内容的描述,在此不作具体赘述。
请参见图2,是本申请实施例提供的一种数据处理方法的流程示意图,该数据处理方法可由上述数据处理系统中的终端设备执行,该数据处理方法包括如下步骤:
S201、获取目标患者的目标时序数据集,目标时序数据集包括目标患者在n个时间节点与目标疾病关联的n个目标疾病统计数据,n为大于1的整数。其中,n个时间节点中的第t个时间节点对应n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n]。
在一个实施例中,目标疾病可以为心衰、心梗、脓毒症等等,终端设备可以与存储设备进行数据交互,该存储设备可以指监护系统对应的服务器或者区块链网络中的节点设备,存储设备预先存储有至少一个患者与目标疾病关联的各时序数据集,任一患者的时序数据集包括所述任一患者在各时间节点与所述目标疾病关联的疾病统计数据。
在一个实施例中,终端设备运行有疾病复发预测平台对应的应用程序,或者可以开启疾病复发预测平台对应的网页。任一用户(可以指患者或者医生)均可以通过登录疾病复发预测平台,提交针对目标患者的疾病复发预测请求,该疾病复发预测请求包括目标患者的身份信息和目标疾病的标识信息,其中,该目标患者的身份信息可以指目标患者的证件号码,或者可以唯一标识目标患者的病历号等。
进一步地,当终端设备检测到用户提交的上述疾病复发预测请求之后,可以基于上述身份信息和目标疾病的标识信息,从存储设备存储的至少一个患者与目标疾病关联的各时序数据集中,获取目标患者的目标时序数据集。
或者,在另一个实施例中,至少一个患者与目标疾病关联的各时序数据集可以预先写入区块链,后续,终端设备可以通过区块链网络中的节点设备,从区块链获取目标患者的目标时序数据集。
S202、将目标时序数据集输入目标疾病复发预测模型,目标疾病复发预测模型中包括n个特征处理模块,n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
在一个实施例中,目标疾病复发预测模型的网络结构图可以如图3所示,目标疾病复发预测模型包括n个特征处理模块,n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t。目标疾病复发预测模型的输入为一个时序数据,输出的预测结果也可以为一个时序数据,该预测结果可以为一个时序序列[y1,y2,y3…y k]表征每一个时间节点下,患者目标疾病复发的预测结果,例如,时间单位为年,k=5,时序序列表征:未来的第一年不复发,未来的第二年不复发、未来第三年不复发、未来第四年不复发,以及未来五年复发。
在应用目标疾病复发预测模型对目标患者进行疾病复发预测之前,可以采集大量样本患者的时序样本数据集,任一样本患者的时序样本数据集包括任一样本患者在各时间节点与目标疾病关联的疾病统计样本数据(包括的数据类型与上述疾病统计数据相同,只是不同患者具体内容或者数值不同)。进一步地,可以将各样本患者的时序样本数据集作为训练 样本,并对各训练样本添加标注标签。
具体地,本申请实施例对每一个个体(即患者)的预测目标,保留两个变量。其中一个变量time代表随访删失的时间,另外一变量event代表事件是否发生(即目标疾病是否复发),复发的event为1,未复发的event为0。由于随访最大时间为26年,本申请实施例可将预测的时间序列设定为27个时间点,每个时间点代表1年,代表27年。这种情况下,在训练过程中,可以对每一个训练样本添加标注标签,标注标签可以为T(e),其中,T代表第几年,在复发的情况下e为1,未复发或者随访删失e为0。后续基于这样的训练样本对模型进行训练,模型的输出将为一个时序输出(即图3中的y1,y2…y k)。
其中,随访删失是指:患有目标疾病的患者在目标疾病治愈后,会定期随访,医生可以基于随访记录了解患者的健康情况,但一些可能随访几年(例如3年)后,且在这几年未复发目标疾病,后续几年便没有了数据记录。由于存在随访删失的样本患者,是否复发目标疾病其实是无法判断的,目前对于存在随访删失的样本患者是无法作为训练样本对目标疾病复发预测模型进行训练。但本方案,可以通过两个变量添加标注标签,即便是随访删失的样本患者也可以作为训练样本,有利于处理删失问题,降低样本的选取难度。
进一步地,可以大量添加有标注标签的训练样本对目标疾病复发预测模型进行训练,训练完成后,当有任一患者的目标时序数据集输入目标疾病复发预测模型,目标疾病复发预测模型均可以输出目标患者复发目标疾病的预测结果。该预测结果可以为一个时序序列,如图3中[y1,y2,y3…y k],表征目标患者在各预设时间节点复发目标疾病的概率,例如,时间单位为年,k=5,预设时间节点为以当前时间为时间起点,未来的第一年、未来的第二年、未来的第三年、未来的第四年以及未来的第五年,例如当前时间为2020年,那么各预设时间节点分别为:2021年、2022年、2023年、2024年、2025年。上述时序序列表征:未来的第一年(即2021年)不复发(即概率为0),未来的第二年(即2022年)不复发(即概率为0)、未来第三年(即2023年)不复发(即概率为0)、未来第四年(即2024年)不复发(即概率为0),以及未来五年(即2025年)复发(即概率为1)。
或者,上述预测结果还可以为对时序序列[y1,y2,y3…y k]中每一个预设时间节点下的y a(a∈[1,k])求和,得到总的一个复发概率S(k)。
S203、调用目标疾病复发预测模型中的第m特征处理模块对输入第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n]。
在一个实施例中,n个特征处理模块包括n个循环神经网络细胞掩组和n个隐含层,n个循环神经网络细胞掩组中的第t循环神经网络细胞掩组和所述n个隐含层中的第t隐含层一一对应,循环神经网络细胞掩组例如可以为LSTM(Long Short-Term Memory,长短期记忆网络)细胞掩组。
具体实现中,假设t=m,任一个第m特征处理模块(即n个特征处理模块中除了第一个特征处理模块以外的其它n-1个特征处理模块)进行词嵌入,确定并输出第m特征信息h m的方式为:通过第t循环神经网络细胞掩组对输入第t循环神经网络细胞掩组的目标疾病统计数据x t和第t-1隐含层输出的第t-1特征信息h t-1进行词嵌入,得到第t循环神经网络细胞掩组对应的第t记忆细胞
Figure PCTCN2021084227-appb-000005
更新第t记忆细胞
Figure PCTCN2021084227-appb-000006
得到第t目标记忆细胞C t,并将第t目标记忆细胞C t输入第t循环神经网络细胞掩组对应的第t隐含层。进一步地,通过第t隐含层对第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
其中,上述对输入第t循环神经网络细胞掩组的目标疾病统计数据x t和第t-1隐含层输 出的第t-1特征信息h t-1进行词嵌入,得到第t循环神经网络细胞掩组对应的第t记忆细胞
Figure PCTCN2021084227-appb-000007
可以是依据式1.1计算得到的。式1.1中w j和u j均为控制门函数,用于控制保留对应信息的比例,例如w j用于控制保留第t-1特征信息h t-1的比例,u j用于控制保留输入第t循环神经网络细胞掩组的目标疾病统计数据x t的比例。b j为一个偏置矩阵,模型训练完成后,该偏置矩阵是确定的。
Figure PCTCN2021084227-appb-000008
在一个实施例中,更新第t记忆细胞
Figure PCTCN2021084227-appb-000009
得到第t目标记忆细胞C t的具体方式为:调用第t循环神经网络细胞掩组根据遗忘门函数和输入门函数更新第t记忆细胞
Figure PCTCN2021084227-appb-000010
得到第t目标记忆细胞C t
其中,调用第t循环神经网络细胞掩组根据遗忘门函数和输入门函数更新第t记忆细胞
Figure PCTCN2021084227-appb-000011
得到第t目标记忆细胞C t,可以是依据式1.2计算得到的。
Figure PCTCN2021084227-appb-000012
其中,f t为遗忘门函数,i t为输入门函数,C t-1为第t-1目标记忆细胞(即针对第t循环神经网络细胞掩组而言,第t循环神经网络细胞掩组的前一个循环神经网络细胞掩组(第t-1循环神经网络细胞掩组)得到的第t-1目标记忆细胞。
f t和i t的计算方式参见式1.3和式1.4。式1.3和式1.4中,σ表示sigmoid激活函数,所有的w和b代表相应的权重矩阵和偏置。
f t=σ(w f.[h t-1,x t]+b f)    式1.3
i t=σ(w i.[h t-1,x t]+b i    式1.4
其中,在一个实施例中,终端设备可调用第t隐含层依照输出门函数对所述第t目标记忆细胞C t输入的第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
示例性地,上述特征压缩的具体过程可以依据式1.5和式1.6进行,其中,f o为输出门,h t为第t隐含层的输出。
f o=σ(w o.[h t-1,x t]+b o)    式1.5
h t=f o*tanh(C t)    式1.6
示例性地,目标疾病复发预测模型的网络结构图可以如图4所示。假设,n为3,目标时序数据集包括目标患者在3个时间节点与目标疾病关联的3个目标疾病统计数据,分别为:x 1、x 2和x 3,那么,可以将x 1输入第一LSTM细胞掩组,将x 2输入第二LSTM细胞掩组、将x 3输入第三LSTM细胞掩组,首先第一LSTM细胞掩组可以提取x 1的特征信息,并将特征信息输入第一隐含层,第一隐含层可以对特征信息进行特征压缩得到第1特征信息h 1,并将第1特征信息h 1输入第二LSTM细胞掩组,第二LSTM细胞掩组可以对h 1和 输入自身的x 2进行词嵌入,得到第二LSTM细胞掩组对应的第二记忆细胞
Figure PCTCN2021084227-appb-000013
并更新第二记忆细胞
Figure PCTCN2021084227-appb-000014
得到第二目标记忆细胞C 2,并将第二目标记忆细胞C 2输入第二循环神经网络细胞掩组对应的第二隐含层。进一步地,第二隐含层对第二目标记忆细胞C 2进行特征压缩,得到第二特征信息h 2,并将特征信息h 2输出至第三LSTM细胞掩组。依此类推,第三LSTM细胞掩组可以对输入自身的x 3和第2特征信息h 2进行词嵌入,得到第三记忆细胞
Figure PCTCN2021084227-appb-000015
并更新第三记忆细胞
Figure PCTCN2021084227-appb-000016
得到第三目标记忆细胞C 3,并将第三目标记忆细胞C 3输入第三循环神经网络细胞掩组对应的第三隐含层,第三隐含层对第三目标记忆细胞C 3进行特征压缩,得到第三特征信息h 3
S204、依照n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定目标患者复发目标疾病的预测结果。
具体实现中,目标疾病复发预测模型中还包括全连层,终端设备可以将n个特征处理模块中第n特征处理模块输出的第n特征信息h n输入全连层,通过全连层对第n特征信息h n进行数据分析,确定目标患者复发目标疾病的预测结果。
在一个实施例中,目标患者复发目标疾病的预测结果包括目标患者在各预设时间节点复发目标疾病的概率。也即,该预测结果为一个时间序列,例如,时间序列为[y1,y2,y3…y k],时间单位为年,k=5,预设时间节点为以当前时间为时间起点,未来的第一年、未来的第二年、未来的第三年、未来的第四年以及未来的第五年,例如当前时间为2020年,那么各预设时间节点分别为:2021年、2022年、2023年、2024年、2025年。上述时序序列表征:未来的第一年(即2021年)不复发(即概率为0),未来的第二年(即2022年)不复发(即概率为0)、未来第三年(即2023年)不复发(即概率为0)、未来第四年(即2024年)不复发(即概率为0),以及未来五年(即2025年)复发(即概率为1)。
在另一个实施例中,目标疾病复发预测模型还包括注意力机制层,终端设备可以通过注意力机制层对输入注意力机制层的第t特征信息h t进行解析,基于解析结果确定第t特征信息h t对应的第t目标权重w t,将第t特征信息h t更新为h t*w t,并将h t*w t输入第t+1循环神经网络细胞掩组;调用第t+1循环神经网络细胞掩组对输入第t+1循环神经网络细胞掩组的目标疾病统计数据x t+1和第t特征信息h t*w t进行词嵌入,得到第t+1循环神经网络细胞掩组对应的第t+1记忆细胞
Figure PCTCN2021084227-appb-000017
示例性地,添加注意力机制层后的目标疾病复发预测模型可以如图5所示。
其中,针对不同时间节点的目标疾病统计数据,注意力机制层可以从时间维度对不同时间节点下的特征信息赋予不同的权重,例如,为第一个时间节点下的h 1赋予一个权重w 1、第二个时间节点下的h 2赋予一个权重w 2、依次类推,第N个时间节点下的h n赋予一个权重w n。可以理解是,在本申请实施例中,基于第t个时间节点的第t个目标疾病统计数据x t得到的特征信息h t,即可以理解为第t个时间节点下的特征信息h t。上述不同时间节点的权重,是模型训练完成后,预先确定的。
或者,在另一个实施例中,注意力机制层可以从变量维度对不同时间节点下的特征信息赋予不同的权重,也即注意力机制层在训练过程中,就已经学习对什么类型的变量赋予多少的权重,这种情况下,注意力机制层可以基于解析结果确定第t特征信息h t对应的第t目标权重w t
这种情况下,当模型输出预测结果后,终端设备可从注意力机制层中获取为每一个时间节点的特征信息h t赋予的权重,从而可以确定每一个时间节点的疾病统计数据对于预测 结果的影响,增加模型输出结果的可解释性。
本申请实施例中,可将目标时序数据集输入目标疾病复发预测模型,调用目标疾病复发预测模型中的第m特征处理模块对输入第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n]。进一步地,可依照n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定目标患者复发目标疾病的预测结果。采用这样的方式,使得目标疾病复发预测模型关注目标时序数据集中各时间节点的疾病统计数据之间前后的联系,有利于提高确定目标患者复发目标疾病的预测结果的准确性。
请参见图6,为本申请实施例提供的一种数据处理装置的结构示意图。本实施例中所描述的数据处理装置,可配置于终端设备中,包括:
获取模块60,用于获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与所述目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
处理模块61,用于将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
所述处理模块61,还用于调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];
所述处理模块61,还用于依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
在一个实施例中,所述n个特征处理模块包括n个循环神经网络细胞掩组和n个隐含层,所述n个循环神经网络细胞掩组中的第t循环神经网络细胞掩组和所述n个隐含层中的第t隐含层一一对应,所述t=m,处理模块61,还具体用于:
通过所述第t循环神经网络细胞掩组对输入所述第t循环神经网络细胞掩组的目标疾病统计数据x t和第t-1隐含层输出的第t-1特征信息h t-1进行词嵌入,得到所述第t循环神经网络细胞掩组对应的第t记忆细胞
Figure PCTCN2021084227-appb-000018
更新所述第t记忆细胞
Figure PCTCN2021084227-appb-000019
得到第t目标记忆细胞C t,并将所述第t目标记忆细胞C t输入所述第t循环神经网络细胞掩组对应的第t隐含层;
通过第t隐含层对所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
在一个实施例中,所述处理模块61,具体用于调用所述第t循环神经网络细胞掩组根据遗忘门函数和输入门函数更新所述第t记忆细胞
Figure PCTCN2021084227-appb-000020
得到第t目标记忆细胞C t
在一个实施例中,所述处理模块61,具体用于调用第t隐含层依照输出门函数对所述第t目标记忆细胞C t输入的所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
在一个实施例中,所述目标疾病复发预测模型还包括注意力机制层,所述处理模块61,还用于通过注意力机制层对输入所述注意力机制层的第t特征信息h t进行解析,基于解析 结果为第t特征信息h t生成第t目标权重w t,将第t特征信息h t更新为h t*w t,并将h t*w t输入第t+1循环神经网络细胞掩组;调用第t+1循环神经网络细胞掩组对输入第t+1循环神经网络细胞掩组的目标疾病统计数据x t+1和第t特征信息h t*w t进行词嵌入,得到第t+1循环神经网络细胞掩组对应的第t+1记忆细胞
Figure PCTCN2021084227-appb-000021
在一个实施例中,所述目标患者复发所述目标疾病的预测结果包括所述目标患者在所述各预设时间节点复发所述目标疾病的概率。
可以理解的是,本实施例的数据处理装置的各功能模块可根据上述方法实施例图2中的方法具体实现,其具体实现过程可以参照上述方法实施例图2的相关描述,此处不再赘述。
本申请实施例中,数据处理装置可将目标时序数据集输入目标疾病复发预测模型,调用目标疾病复发预测模型中的第m特征处理模块对输入第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n]。进一步地,可依照n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定目标患者复发目标疾病的预测结果。采用这样的方式,使得目标疾病复发预测模型关注目标时序数据集中各时间节点的疾病统计数据之间前后的联系,有利于提高确定目标患者复发目标疾病的预测结果的准确性。
请参阅图7,图7为本申请实施例提供的一种终端设备的结构示意图。该终端设备可以包括处理器和存储器。可选的,该终端设备还可包括输出设备。例如,该终端设备可以包括:一个或多个处理器701;一个或多个输出设备702和存储器703。上述处理器701、输出设备702和存储器703通过总线连接。存储器703用于存储计算机程序,所述计算机程序包括程序指令,处理器701用于执行存储器703存储的程序指令,执行以下操作:
获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与所述目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t;调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];
依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
在一个实施例中,所述n个特征处理模块包括n个循环神经网络细胞掩组和n个隐含层,所述n个循环神经网络细胞掩组中的第t循环神经网络细胞掩组和所述n个隐含层中的第t隐含层一一对应,所述t=m,处理模块61,还具体用于:
通过所述第t循环神经网络细胞掩组对输入所述第t循环神经网络细胞掩组的目标疾病统计数据x t和第t-1隐含层输出的第t-1特征信息h t-1进行词嵌入,得到所述第t循环神经 网络细胞掩组对应的第t记忆细胞
Figure PCTCN2021084227-appb-000022
更新所述第t记忆细胞
Figure PCTCN2021084227-appb-000023
得到第t目标记忆细胞C t,并将所述第t目标记忆细胞C t输入所述第t循环神经网络细胞掩组对应的第t隐含层;
通过第t隐含层对所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
在一个实施例中,所述处理器701,具体用于调用所述第t循环神经网络细胞掩组根据遗忘门函数和输入门函数更新所述第t记忆细胞
Figure PCTCN2021084227-appb-000024
得到第t目标记忆细胞C t
在一个实施例中,所述处理器701,具体用于调用第t隐含层依照输出门函数对所述第t目标记忆细胞C t输入的所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
在一个实施例中,所述目标疾病复发预测模型还包括注意力机制层,所述处理器701,还用于通过注意力机制层对输入所述注意力机制层的第t特征信息h t进行解析,基于解析结果为第t特征信息h t生成第t目标权重w t,将第t特征信息h t更新为h t*w t,并将h t*w t输入第t+1循环神经网络细胞掩组;调用第t+1循环神经网络细胞掩组对输入第t+1循环神经网络细胞掩组的目标疾病统计数据x t+1和第t特征信息h t*w t进行词嵌入,得到第t+1循环神经网络细胞掩组对应的第t+1记忆细胞
Figure PCTCN2021084227-appb-000025
在一个实施例中,所述目标患者复发所述目标疾病的预测结果包括所述目标患者在所述各预设时间节点复发所述目标疾病的概率。
应当理解,在本申请实施例中,所称处理器701可以是中央处理单元(Central Processing Unit,CPU),该处理器701还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器703可以包括只读存储器和随机存取存储器,并向处理器701提供指令和数据。存储器703的一部分还可以包括非易失性随机存取存储器。
具体实现中,本申请实施例中所描述的处理器701、输出设备702和存储器703可执行本申请实施例提供的数据处理方法所描述的实现方式,也可执行本申请实施例所描述的数据处理装置的实现方式,在此不再赘述。
本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时,可执行上述数据处理方法实施例中所执行的步骤。
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行上述数据处理方法实施例中所执行的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。所述的计算机可读存储介质可主要包括存储程序区和存储数据区, 其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
其中,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
以上所揭露的仅为本申请一种较佳实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于发明所涵盖的范围。

Claims (20)

  1. 一种数据处理系统,其中,该系统包括存储设备和终端设备,其中:
    所述存储设备,用于存储至少一个患者与目标疾病关联的各时序数据集,任一患者的时序数据集包括所述任一患者在各时间节点与所述目标疾病关联的疾病统计数据;
    所述终端设备,用于从所述存储设备存储的所述各时序数据集中获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与所述目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
    所述终端设备,还用于将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
    所述终端设备,还用于调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n],依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
  2. 一种数据处理方法,其中,所述方法包括:
    获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
    将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
    调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];
    依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
  3. 根据权利要求2所述的方法,其中,所述n个特征处理模块包括n个循环神经网络细胞掩组和n个隐含层,所述n个循环神经网络细胞掩组中的第t循环神经网络细胞掩组和所述n个隐含层中的第t隐含层一一对应,所述t=m,任一个第m特征处理模块进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,包括:
    通过所述第t循环神经网络细胞掩组对输入所述第t循环神经网络细胞掩组的目标疾病统计数据x t和第t-1隐含层输出的第t-1特征信息h t-1进行词嵌入,得到所述第t循环神经网络细胞掩组对应的第t记忆细胞
    Figure PCTCN2021084227-appb-100001
    更新所述第t记忆细胞
    Figure PCTCN2021084227-appb-100002
    得到第t目标记忆细胞C t,并将所述第t目标记忆细胞C t输入所述第t循环神经网络细胞掩组对应的第t隐含层;
    通过第t隐含层对所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息 h t
  4. 根据权利要求3所述的方法,其中,所述更新所述第t记忆细胞
    Figure PCTCN2021084227-appb-100003
    得到第t目标记忆细胞C t,包括:
    调用所述第t循环神经网络细胞掩组根据遗忘门函数和输入门函数更新所述第t记忆细胞
    Figure PCTCN2021084227-appb-100004
    得到第t目标记忆细胞C t
  5. 根据权利要求3所述的方法,其中,所述通过第t隐含层对所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t,包括:
    调用第t隐含层依照输出门函数对所述第t目标记忆细胞C t输入的所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
  6. 根据权利要求3所述的方法,其中,所述目标疾病复发预测模型还包括注意力机制层,所述方法还包括:
    通过注意力机制层对输入所述注意力机制层的第t特征信息h t进行解析,基于解析结果确定所述第t特征信息h t对应的第t目标权重w t,将所述第t特征信息h t更新为h t*w t,并将所述h t*w t输入第t+1循环神经网络细胞掩组;调用第t+1循环神经网络细胞掩组对输入所述第t+1循环神经网络细胞掩组的目标疾病统计数据x t+1和所述第t特征信息h t*w t进行词嵌入,得到所述第t+1循环神经网络细胞掩组对应的第t+1记忆细胞
    Figure PCTCN2021084227-appb-100005
  7. 根据权利要求2所述的方法,其中,所述目标患者复发所述目标疾病的预测结果包括所述目标患者在各预设时间节点复发所述目标疾病的概率。
  8. 一种数据处理装置,其中,所述装置包括:
    获取模块,用于获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
    处理模块,用于将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
    所述处理模块,还用于调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];
    所述处理模块,还用于依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
  9. 一种终端设备,其中,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行以下方法:
    获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间节点与目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
    将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
    调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];
    依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
  10. 根据权利要求9所述的终端设备,其中,所述n个特征处理模块包括n个循环神经网络细胞掩组和n个隐含层,所述n个循环神经网络细胞掩组中的第t循环神经网络细胞掩组和所述n个隐含层中的第t隐含层一一对应,所述t=m,任一个第m特征处理模块进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,包括:
    通过所述第t循环神经网络细胞掩组对输入所述第t循环神经网络细胞掩组的目标疾病统计数据x t和第t-1隐含层输出的第t-1特征信息h t-1进行词嵌入,得到所述第t循环神经网络细胞掩组对应的第t记忆细胞
    Figure PCTCN2021084227-appb-100006
    更新所述第t记忆细胞
    Figure PCTCN2021084227-appb-100007
    得到第t目标记忆细胞C t,并将所述第t目标记忆细胞C t输入所述第t循环神经网络细胞掩组对应的第t隐含层;
    通过第t隐含层对所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
  11. 根据权利要求10所述的终端设备,其中,执行所述更新所述第t记忆细胞
    Figure PCTCN2021084227-appb-100008
    得到第t目标记忆细胞C t,包括:
    调用所述第t循环神经网络细胞掩组根据遗忘门函数和输入门函数更新所述第t记忆细胞
    Figure PCTCN2021084227-appb-100009
    得到第t目标记忆细胞C t
  12. 根据权利要求10所述的终端设备,其中,执行所述通过第t隐含层对所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t,包括:
    调用第t隐含层依照输出门函数对所述第t目标记忆细胞C t输入的所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
  13. 根据权利要求10所述的终端设备,其中,所述目标疾病复发预测模型还包括注意力机制层,所述处理器还用于执行:
    通过注意力机制层对输入所述注意力机制层的第t特征信息h t进行解析,基于解析结果确定所述第t特征信息h t对应的第t目标权重w t,将所述第t特征信息h t更新为h t*w t,并将所述h t*w t输入第t+1循环神经网络细胞掩组;调用第t+1循环神经网络细胞掩组对输入所述第t+1循环神经网络细胞掩组的目标疾病统计数据x t+1和所述第t特征信息h t*w t进行词嵌入,得到所述第t+1循环神经网络细胞掩组对应的第t+1记忆细胞
    Figure PCTCN2021084227-appb-100010
  14. 根据权利要求9所述的终端设备,其中,所述目标患者复发所述目标疾病的预测结果包括所述目标患者在各预设时间节点复发所述目标疾病的概率。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行以下方法:
    获取目标患者的目标时序数据集,所述目标时序数据集包括所述目标患者在n个时间 节点与目标疾病关联的n个目标疾病统计数据,n为大于1的整数;其中,所述n个时间节点中的第t个时间节点对应所述n个目标疾病统计数据中的第t个目标疾病统计数据x t,t∈[1,n];
    将所述目标时序数据集输入目标疾病复发预测模型,所述目标疾病复发预测模型中包括n个特征处理模块,所述n个特征处理模块中的第t特征处理模块用于对第t个目标疾病统计数据x t进行特征提取,得到第t特征信息h t
    调用所述目标疾病复发预测模型中的第m特征处理模块对输入所述第m特征处理模块的目标疾病统计数据x m和第m-1特征处理模块输出的第m-1特征信息h m-1进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,m∈[2,n];
    依照所述n个特征处理模块中第n特征处理模块输出的第n特征信息h n确定所述目标患者复发所述目标疾病的预测结果;其中,第1特征信息h 1是根据第1个目标疾病统计数据x 1确定的。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述n个特征处理模块包括n个循环神经网络细胞掩组和n个隐含层,所述n个循环神经网络细胞掩组中的第t循环神经网络细胞掩组和所述n个隐含层中的第t隐含层一一对应,所述t=m,任一个第m特征处理模块进行词嵌入,并基于词嵌入结果确定并输出第m特征信息h m,包括:
    通过所述第t循环神经网络细胞掩组对输入所述第t循环神经网络细胞掩组的目标疾病统计数据x t和第t-1隐含层输出的第t-1特征信息h t-1进行词嵌入,得到所述第t循环神经网络细胞掩组对应的第t记忆细胞
    Figure PCTCN2021084227-appb-100011
    更新所述第t记忆细胞
    Figure PCTCN2021084227-appb-100012
    得到第t目标记忆细胞C t,并将所述第t目标记忆细胞C t输入所述第t循环神经网络细胞掩组对应的第t隐含层;
    通过第t隐含层对所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
  17. 根据权利要求16所述的计算机可读存储介质,其中,执行所述更新所述第t记忆细胞
    Figure PCTCN2021084227-appb-100013
    得到第t目标记忆细胞C t,包括:
    调用所述第t循环神经网络细胞掩组根据遗忘门函数和输入门函数更新所述第t记忆细胞
    Figure PCTCN2021084227-appb-100014
    得到第t目标记忆细胞C t
  18. 根据权利要求16所述的计算机可读存储介质,其中,执行所述通过第t隐含层对所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t,包括:
    调用第t隐含层依照输出门函数对所述第t目标记忆细胞C t输入的所述第t目标记忆细胞C t进行特征压缩,得到并输出第t特征信息h t
  19. 根据权利要求16所述的系统,其中,所述目标疾病复发预测模型还包括注意力机制层,所述程序指令当被处理器执行时还使所述处理器执行:
    通过注意力机制层对输入所述注意力机制层的第t特征信息h t进行解析,基于解析结果确定所述第t特征信息h t对应的第t目标权重w t,将所述第t特征信息h t更新为h t*w t,并将所述h t*w t输入第t+1循环神经网络细胞掩组;调用第t+1循环神经网络细胞掩组对输入所述第t+1循环神经网络细胞掩组的目标疾病统计数据x t+1和所述第t特征信息h t*w t进行词嵌入,得到所述第t+1循环神经网络细胞掩组对应的第t+1记忆细胞
    Figure PCTCN2021084227-appb-100015
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述目标患者复发所述目标疾病的预测结果包括所述目标患者在各预设时间节点复发所述目标疾病的概率。
PCT/CN2021/084227 2020-11-04 2021-03-31 一种数据处理系统、方法、装置及存储介质 WO2021190661A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011213492.4 2020-11-04
CN202011213492.4A CN112102950B (zh) 2020-11-04 2020-11-04 一种数据处理系统、方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2021190661A1 true WO2021190661A1 (zh) 2021-09-30

Family

ID=73784527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084227 WO2021190661A1 (zh) 2020-11-04 2021-03-31 一种数据处理系统、方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN112102950B (zh)
WO (1) WO2021190661A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115495498A (zh) * 2022-09-23 2022-12-20 共青科技职业学院 数据关联方法、系统、电子设备及存储介质
CN117831789A (zh) * 2024-03-05 2024-04-05 北京市肿瘤防治研究所 癌症治疗反应预测系统及其控制方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102950B (zh) * 2020-11-04 2021-02-12 平安科技(深圳)有限公司 一种数据处理系统、方法、装置及存储介质
CN112528009A (zh) * 2020-12-07 2021-03-19 北京健康有益科技有限公司 生成用户慢病调理方案的方法、装置及计算机可读介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170177809A1 (en) * 2015-12-16 2017-06-22 Alegeus Technologies, Llc Systems and methods for reducing resource consumption via information technology infrastructure
CN108417272A (zh) * 2018-02-08 2018-08-17 合肥工业大学 带时序约束的相似病例推荐方法及装置
CN109493975A (zh) * 2018-12-20 2019-03-19 广州天鹏计算机科技有限公司 基于xgboost模型的慢性病复发预测方法、装置和计算机设备
CN109659033A (zh) * 2018-12-18 2019-04-19 浙江大学 一种基于循环神经网络的慢性疾病病情变化事件预测装置
CN110459324A (zh) * 2019-06-27 2019-11-15 平安科技(深圳)有限公司 基于长短期记忆模型的疾病预测方法、装置和计算机设备
CN112102950A (zh) * 2020-11-04 2020-12-18 平安科技(深圳)有限公司 一种数据处理系统、方法、装置及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9916658B2 (en) * 2013-09-19 2018-03-13 Keio University Disease analysis apparatus, control method, and program
KR102225894B1 (ko) * 2018-04-24 2021-03-11 네이버 주식회사 딥 어텐션 네트워크를 이용하여 환자 의료 기록으로부터 질병 예후를 예측하는 방법 및 시스템
CN112289442B (zh) * 2018-10-29 2024-05-03 南京医基云医疗数据研究院有限公司 预测疾病终点事件的方法、装置及电子设备
CN109585020A (zh) * 2018-11-27 2019-04-05 华侨大学 一种运用卷积神经网络对疾病风险预测的模型
CN109493976A (zh) * 2018-12-20 2019-03-19 广州天鹏计算机科技有限公司 基于卷积神经网络模型的慢性病复发预测方法和装置
CN110119849B (zh) * 2019-05-21 2020-08-04 山东大学 一种基于网络行为的人格特质预测方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170177809A1 (en) * 2015-12-16 2017-06-22 Alegeus Technologies, Llc Systems and methods for reducing resource consumption via information technology infrastructure
CN108417272A (zh) * 2018-02-08 2018-08-17 合肥工业大学 带时序约束的相似病例推荐方法及装置
CN109659033A (zh) * 2018-12-18 2019-04-19 浙江大学 一种基于循环神经网络的慢性疾病病情变化事件预测装置
CN109493975A (zh) * 2018-12-20 2019-03-19 广州天鹏计算机科技有限公司 基于xgboost模型的慢性病复发预测方法、装置和计算机设备
CN110459324A (zh) * 2019-06-27 2019-11-15 平安科技(深圳)有限公司 基于长短期记忆模型的疾病预测方法、装置和计算机设备
CN112102950A (zh) * 2020-11-04 2020-12-18 平安科技(深圳)有限公司 一种数据处理系统、方法、装置及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115495498A (zh) * 2022-09-23 2022-12-20 共青科技职业学院 数据关联方法、系统、电子设备及存储介质
CN117831789A (zh) * 2024-03-05 2024-04-05 北京市肿瘤防治研究所 癌症治疗反应预测系统及其控制方法
CN117831789B (zh) * 2024-03-05 2024-05-28 北京市肿瘤防治研究所 癌症治疗反应预测系统及其控制方法

Also Published As

Publication number Publication date
CN112102950A (zh) 2020-12-18
CN112102950B (zh) 2021-02-12

Similar Documents

Publication Publication Date Title
WO2021190661A1 (zh) 一种数据处理系统、方法、装置及存储介质
Islam et al. Likelihood prediction of diabetes at early stage using data mining techniques
Vásquez-Morales et al. Explainable prediction of chronic renal disease in the colombian population using neural networks and case-based reasoning
Muhammad et al. Early and accurate detection and diagnosis of heart disease using intelligent computational model
US20200365270A1 (en) Drug efficacy prediction for treatment of genetic disease
Sim et al. The major effects of health-related quality of life on 5-year survival prediction among lung cancer survivors: applications of machine learning
CN109326353B (zh) 预测疾病终点事件的方法、装置及电子设备
Zhang et al. Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach
Li et al. Deep Bayesian Gaussian processes for uncertainty estimation in electronic health records
CN112201346A (zh) 癌症生存期预测方法、装置、计算设备及计算机可读存储介质
CN112132624A (zh) 医疗理赔数据预测系统
Jijitha et al. Breast cancer prognosis using machine learning techniques and genetic algorithm: experiment on six different datasets
Al Reshan et al. A Robust Heart Disease Prediction System Using Hybrid Deep Neural Networks
Lee et al. The predictive skill of convolutional neural networks models for disease forecasting
Senthil et al. Develop the hybrid Adadelta Stochastic Gradient Classifier with optimized feature selection algorithm to predict the heart disease at earlier stage
Zhang et al. Survival neural networks for time-to-event prediction in longitudinal study
Chiari et al. Length of stay prediction for Northern Italy COVID-19 patients based on lab tests and X-ray data
Retal et al. Machine learning for diabetes prediction: a systematic review and a conceptual framework for early prediction
WO2021203997A1 (zh) 一种融合并发症风险的慢病医保费用预测方法及相关设备
Sumathi et al. Machine learning based pattern detection technique for diabetes mellitus prediction
Maazouzi et al. AI-Driven Big Healthcare Analytics: Contributions and Challenges
Miazi et al. A cloud-based app for early detection of type II diabetes with the aid of deep learning
Rane et al. Neural Network Based Mortality Prediction in Covid-19 Dataset
Anjana et al. Improving sepsis classification performance with artificial intelligence algorithms: A comprehensive overview of healthcare applications
El-Bashbishy et al. Pediatric diabetes prediction using deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21776384

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21776384

Country of ref document: EP

Kind code of ref document: A1