CN117393149A - Time sequence data prediction method for lung nodule pathogenesis - Google Patents
Time sequence data prediction method for lung nodule pathogenesis Download PDFInfo
- Publication number
- CN117393149A CN117393149A CN202311409606.6A CN202311409606A CN117393149A CN 117393149 A CN117393149 A CN 117393149A CN 202311409606 A CN202311409606 A CN 202311409606A CN 117393149 A CN117393149 A CN 117393149A
- Authority
- CN
- China
- Prior art keywords
- time sequence
- sequence data
- data set
- data
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 206010056342 Pulmonary mass Diseases 0.000 title claims abstract description 12
- 230000008506 pathogenesis Effects 0.000 title description 4
- 238000013519 translation Methods 0.000 claims abstract description 30
- 238000012545 processing Methods 0.000 claims abstract description 28
- 230000003993 interaction Effects 0.000 claims abstract description 20
- 230000009467 reduction Effects 0.000 claims abstract description 18
- 238000000354 decomposition reaction Methods 0.000 claims description 15
- 230000002685 pulmonary effect Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 4
- 201000008827 tuberculosis Diseases 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 241001123248 Arma Species 0.000 description 1
- 230000008485 antagonism Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Health & Medical Sciences (AREA)
- Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Epidemiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical neighborhood of artificial intelligence, and discloses a time sequence data prediction method for lung nodule onset, which comprises the steps of acquiring a time sequence data set containing a plurality of characteristics of lung nodule onset, and carrying out translation interaction processing on the time sequence data of each characteristic to obtain an initial time sequence data set; respectively carrying out data noise reduction and data enhancement processing on the time sequence data of each feature in the initial time sequence data set to obtain a final time sequence data set; and finally, carrying out regression processing on the final time sequence data set by using the extreme random tree to obtain a prediction result. The prediction method of the invention has high precision in predicting the data set of the public tuberculosis, whether in processing the ultralong sequence or in parallelism.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a time sequence data method for lung nodule onset.
Background
Time series is a type of data that measures changes in things over time, while timing predictions use the amount of data in a past period of time to predict the amount of information in a future period of time, and modern people believe that timing analysis originates from an autoregressive model as proposed by the United kingdom statistician G.u.yule since 1927. Since then, especially after the popularity of computer software and hardware, more and more researchers are paying attention to the importance of time sequence prediction, the application range of the time sequence prediction is expanding in industry and academia, people are also raising another wave of time sequence prediction research, and the main time sequence prediction method and the technical problems thereof at present are as follows:
1) Based on traditional time series prediction models, such as smoothing, trend fitting, combining, AR, MA, ARMA, ARIMA, etc., the series of algorithms requires that the time series data should be stable and insensitive to the relationship between nonlinearities.
2) Based on intelligent algorithms, such as propset model, which was proposed by Facebook in 2017, propset uses the idea of time series decomposition to build regression models on time axis respectively, and optimize its parameters through bayesian framework, the series of algorithms often need to provide them with sufficiently correct information, and a priori distribution of parameters.
3) Based on the method of the cyclic neural network, the concept of time sequence is introduced by utilizing the requirement of the long-short-term memory neural network in the NLP field on the natural language context prediction, but the LSTM algorithm has poor effect on the extremely large sequence prediction.
Disclosure of Invention
The invention provides a time sequence data prediction method for lung nodule onset, which adopts a time sequence data translation interaction mechanism to carry out translation interaction processing on a time sequence data set, and solves the technical problem of insufficient connection of time sequence data sequences before and after.
The invention can be realized by the following technical scheme:
a time series data prediction method for lung nodule pathogenesis comprises the following steps
Acquiring a time sequence data set containing a plurality of characteristics of lung nodule onset, and carrying out translation interaction processing on the time sequence data of each characteristic to acquire an initial time sequence data set;
respectively carrying out data noise reduction and data enhancement processing on the time sequence data of each feature in the initial time sequence data set to obtain a final time sequence data set;
and finally, carrying out regression processing on the final time sequence data set by using the extreme random tree to obtain a prediction result.
Further, the time sequence data set is set into a matrix structure taking time sequence data of each feature as a column vector, and the steps of carrying out translation interaction processing on the time sequence data of each feature are as follows:
1) Deep copying time-series data set D to obtain D shift ;
2) If the translation times n diff Not 0, then each feature column vector in the time series data set D is translated by a feature unit, and n is the same time diff Subtracting 1;
3) Filling nan with null values generated by translation, adding new features generated after translation into D according to corresponding dimensions shift ;
4) Repeating S2 and S3 until n diff Is 0;
5) In D shift Performing linear operation according to all new features generated by data translation at each moment and the features at the original moment;
6) A final data set is obtained as an initial data set.
Further, judging each line of data one by one, if the characteristic value is NAN, marking the data segment at the moment, and finally deleting all marked data segments.
Further, the EEMD integrated empirical mode decomposition method is utilized to conduct noise reduction processing on the initial time sequence data set, then the TimeGAN is utilized to generate countermeasure network to conduct multi-dimensional data enhancement on time sequence data of all the features in the initial time sequence data set after noise reduction processing, and a final time sequence data set is obtained.
A time series data prediction device based on the time series data prediction method facing the occurrence of pulmonary nodules comprises
A time sequence data set adopts a matrix structure taking time sequence data of each characteristic as a column vector;
the time sequence feature translation interaction module is used for carrying out data translation and interaction processing on each feature column vector in the time sequence data set to obtain an initial time sequence data set;
the data noise reduction module is used for carrying out noise reduction processing on the initial time sequence data set;
the data enhancement module is used for carrying out data enhancement on the initial time sequence data set after noise reduction to obtain a final time sequence data set;
the prediction module is used for predicting the final time sequence data set.
Furthermore, the data denoising module adopts an EEMD integrated empirical mode decomposition method to perform data denoising processing on the initial time sequence data set, the data enhancement module adopts TimeGAN to generate the initial time sequence data set subjected to noise reduction of the countermeasure network to perform data enhancement processing, and the prediction module adopts an extreme random tree to predict the final time sequence data set.
The beneficial technical effects of the invention are as follows:
1) The effect of the prediction method of the invention on predicting the data set of the public tuberculosis is higher than the model precision in the aspect of processing the ultra-long sequence or parallelism.
2) On the problem of insufficient connection of front and rear data sequences, time sequence characteristic translation interaction is adopted to solve the problem, time sequence characteristics favorable for prediction are obtained, meanwhile, an EEMD method of characteristic decomposition is used for eliminating noise, then an antagonism network is generated through TimeGAN to carry out data enhancement, and then the data are input into a prediction model, and the regression mode of an ET extreme random tree of the integration model is used for prediction, so that accuracy is high and effects are obvious.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a schematic diagram of results before and after a translation interaction by the time series data translation interaction mechanism of the present invention;
FIG. 3 is a diagram showing a process of data noise reduction by EEMD method;
fig. 4 is a schematic diagram of a structure of a time series data prediction device for a time series data prediction method for occurrence of pulmonary nodule according to the present invention.
Detailed Description
The invention will be described in further detail with reference to the drawings and the specific examples.
As shown in fig. 1, the invention provides a time series data prediction method for lung nodule pathogenesis, which mainly comprises the following steps:
in step S101, a timing sequence feature translation interaction mechanism is used to improve the problem that the extremely random tree model is incomplete in using the timing sequence features, and the prior knowledge of the timing sequence is increased.
Specifically, collecting time series data about each feature of pulmonary nodule disease to construct a time series data set, namely D, dividing the data into N dimensions according to time series sequence, wherein N is the number of time series data strips, each square in each column represents a feature, each row represents a feature of the whole time period, as shown in figure 2, firstly deeply copying the time series data set, translating N times in column vector unit, then translating the newly obtained feature, namely the original feature after decomposition, into N steps to generate new feature, adding the data set after deep copying according to the sequence of corresponding dimensions, and expanding the total content strip number of the data set after deep copying to N 2 As shown in FIG. 2, D' N For D N The result after one feature unit translation is carried out, and finally D after expansion shift In accordance with the data translation at each momentAnd carrying out linear operation on the new features and the features at the original time to obtain a final data set.
The process of performing data translation interactions is as follows:
1) Deep copying time-series data set D to obtain D shift ;
2) If the translation times n diff Not 0, then each feature column vector in the time series data set D is translated by a feature unit, and n is the same time diff Subtracting 1;
3) Filling nan with null values generated by translation, adding new features generated after translation into D according to corresponding dimensions shift ;
4) Repeating S2 and S3 until n diff Is 0;
5) In D shift Performing linear operation according to all new features generated by data translation at each moment and the features at the original moment;
6) A final data set is obtained, and the final data set is taken as an initial data set to participate in subsequent calculation.
In step S102, a time-series data discriminator is used to determine whether or not there is a null value in the data of all the time periods of the current feature, and if there is a null value, it is deleted.
Specifically, each line of data is judged one by one, if the characteristic value is NAN, the data segment at the moment is marked, and finally all marked data segments are deleted.
In step S103, the initial time series data is noise-reduced.
Because the dimension of the input time sequence is inconsistent, when the dimension of the input time sequence is smaller and noise exists, the EEMD integrated empirical mode decomposition method is selected to further eliminate the noise and strengthen the characteristics.
Specifically, in this embodiment, the basic flow of the EEMD integrated empirical mode decomposition method is as follows:
firstly, determining the number of times T of noise addition, and adding white noise w to each dimension characteristic of the input according to the number of times i The white noise algorithm is equation (1), for each dimension of the feature s T (t) obtaining each component A [ s ] of the P-th order by adopting an empirical mode decomposition algorithm T (t)]Then, the average residual component r of the order P is obtained by using a mean value formula P The specific algorithm is formula (2), formula (3), and the average IMF (Intrinsic Mode Functions, IMF) component C of the P-th order is obtained from formula (4) P Finally let s (t) =r P By analogy, the IMF components of 1 to P-1 order can be sequentially obtained until r P Can not be decomposed, i.e. r P The termination condition is satisfied and the signal type is a monotone signal.
The result of the analysis algorithm can be C from equation (5), where C is a set of IMF components of order 1 to P-1, and C is taken as input into the data enhancement module.
w i =(-1) q εv i (t) (1)
q∈[1,2]
i∈[1,T]
Wherein w is i,i∈[1,T] To add random white noise, ε is the standard white noise, v i And (t) white noise is simultaneously in conformity with normal distribution.
s T (t)=s(t)+β P-1 E P (w T ) (2)
Wherein E is P The P-stage empirical mode decomposition component and beta are generated by adopting an empirical mode decomposition algorithm i,i∈[1,P-1] Is a constant coefficient.
Where a is the k-th order residual value of the empirical mode decomposition k-th order IMF component map.
C P (t)=s(t)-r P (4)
C=C j,j∈[1,P-1] (5)
In step S104, the initial time series data set after noise reduction is data enhanced by using the TimeGAN generation countermeasure network, and the data is enriched, so as to obtain the final time series data set.
The method comprises the steps of adopting TimeGAN generation to enhance data against a network algorithm, adopting original data as progressive supervision loss by TimeGAN, further enabling a model to capture conditional distribution in time sequence data, simultaneously introducing Embedding Network to provide reversible mapping between potential characterization, further reducing high dimensionality of Generative Adversrial Learning, and finally utilizing supervised sharing and joint training embedded network to generate real time sequence data.
Specifically, in the present embodiment, parameters to be provided to the TimeGAN generation countermeasure network are appropriately defined according to the requirements, as shown in table 1 below.
TABLE 1
In step S105, prediction is performed using the regression mode of the extreme random tree.
Specifically, in this embodiment, because the extremely random tree is sensitive to high-dimensional features, parallel computation is realized and useful features can be fully selected, because random sampling exists, the trained model variance is small, the model generalization capability is good, the implementation mode on codes is simple, before the regression task mode algorithm of the extremely random tree is executed, the data subjected to enhancement and cleaning through the steps is divided into a training set and a test set and then standardized, the extremely random tree is subjected to random attribute P of a single tree node, the minimum number of samples N of the tree node splitting and finally the influence of model rule modulus S is large, therefore, grid search is used in parameter optimization, the influence degree of mean square error is realized through different parameters, the larger error is considered to be better, finally, the implementation core is that the steps of sample selection, feature selection, decision tree construction and extremely random tree prediction are completed, three steps before extremely random tree prediction are performed to form a forest, the tested samples are input into the formed forest, the extremely random tree is subjected to the iteration prediction by the extremely random tree, and finally, the test result is obtained by carrying out the extremely random tree prediction by using the test results. The results are shown in Table 2.
TABLE 2
In summary, compared with the existing time sequence prediction method, the method has the advantages that:
1) The model is more accurate than the above-mentioned model in predicting the data set of the open tuberculosis, both in processing very long sequences and in parallelism.
2) On the problem of insufficient connection of front and rear data sequences, time sequence characteristic interaction is adopted to solve the problem, time sequence characteristics favorable for prediction are obtained, an EMD method of characteristic decomposition is used to eliminate noise, then data is enhanced through a TimeGAN module and then is input into a prediction section, and the regression mode of an ET extreme random tree is used for prediction, so that the effect is remarkable.
Next, a time sequence prediction method device for lung nodule onset according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 4 is a schematic structural diagram of a timing prediction apparatus for a timing prediction method for pulmonary nodule onset according to an embodiment of the present invention.
The time sequence prediction device of the time sequence prediction method for the occurrence of the pulmonary nodule comprises the following steps:
a timing feature interaction module 100;
a time series data discriminator module 200;
a data noise reduction module 300;
a data enhancement module 400;
a prediction module 500;
the time sequence feature interaction module 100 strengthens the connection before and after the time sequence features;
the time sequence data discriminator module 200 judges the existence of the current time sequence feature;
the data noise reduction module 300 eliminates data with noise in the time sequence data;
the data enhancement module 400 increases the number of samples of the model and enriches the data set;
the prediction module 500 performs regression prediction on the processed data by transmitting the processed data into an extreme random tree.
Further, in one embodiment of the present invention, to ameliorate the problem of incomplete use of the extremely random tree model for time series features, a time series feature interaction module is used to add a priori knowledge of the time series.
Further, in one embodiment of the present invention, in order to determine whether the current data is meaningful, a time sequence data discriminator is used, and if the current data is null, the value is set as NAN:
because the dimension of the input time sequence is inconsistent, when the dimension of the input time sequence is smaller and noise exists, the EEMD integrated empirical mode decomposition method is selected to further eliminate the noise and strengthen the characteristics.
Further, in one embodiment of the present invention, the main idea of using TimeGAN generation against a network for data enhancement is to learn the way data enhancement from the data, while TimeGAN introduces the use of raw data as progressive supervised loss, further letting the model capture the conditional distribution in the time series data, while introducing Embedding Network the reversible mapping between the provisioning and potential characterization, further reducing the high dimensionality of Generative Adversrial Learning, and finally generating realistic time series data using supervised sharing and jointly trained embedded networks.
Further, in one embodiment of the present invention, because the extremely random tree is more sensitive to high-dimensional features, parallel computation is realized and useful features can be fully selected, because random sampling exists, the trained model variance is small, the model generalization capability is good, the implementation on codes is also simple, before executing the regression task mode algorithm of the extremely random tree, the data subjected to the steps are firstly subjected to enhancement and cleaning to form a training set and a test set, then standardized processing is performed, the extremely random tree is subjected to random attribute P of a single tree node in the process of generation, the minimum number of samples N of the tree node splitting, the influence of the model rule quantity S is large, therefore, grid search is used in parameter optimization, the influence degree of mean square error is realized through different parameters, the larger error is considered to be better, finally, the implementation core is that samples are selected, the selected features, decision tree construction and extremely random tree prediction are completed, three steps before the extremely random tree prediction are performed, then the decision forest is formed, the samples are input into the forest to be tested, the regression tree is formed, and finally, the extremely random tree is predicted by using the iteration results of each test tree, and the extremely random tree is predicted, and the result is obtained by using the extremely predicted.
It should be noted that, the foregoing explanation of the embodiment of the method for identifying intent of user questions and answers in the industrial field is also applicable to the device of this embodiment, and will not be repeated here.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Claims (6)
1. A time sequence data prediction method for lung nodule onset is characterized by comprising the following steps: comprising
Acquiring a time sequence data set containing a plurality of characteristics of lung nodule onset, and carrying out translation interaction processing on the time sequence data of each characteristic to acquire an initial time sequence data set;
respectively carrying out data noise reduction and data enhancement processing on the time sequence data of each feature in the initial time sequence data set to obtain a final time sequence data set;
and finally, carrying out regression processing on the final time sequence data set by using the extreme random tree to obtain a prediction result.
2. The method for predicting time series data for onset of pulmonary nodules of claim 1, wherein: the time sequence data set is set into a matrix structure taking time sequence data of each feature as a column vector, and the steps of carrying out translation interaction processing on the time sequence data of each feature are as follows:
1) Deep copying time-series data set D to obtain D shift ;
2) If the translation times n diff Not 0, then each feature column vector in the time series data set D is translated by a feature unit, and n is the same time diff Subtracting 1;
3) Filling nan with null values generated by translation, adding new features generated after translation into D according to corresponding dimensions shift ;
4) Repeating S2 and S3 until n diff Is 0;
5) In D shift Performing linear operation according to all new features generated by data translation at each moment and the features at the original moment;
6) A final data set is obtained as an initial data set.
3. The method for predicting time series data for onset of pulmonary nodules of claim 2, wherein: judging each line of data one by one, if the characteristic value is NAN, marking the data segment at the moment, and finally deleting all marked data segments.
4. The method for predicting time series data for onset of pulmonary nodules of claim 1, wherein: and (3) carrying out noise reduction processing on the initial time sequence data set by using an EEMD integrated empirical mode decomposition method, and then generating an countermeasure network by using TimeGAN to carry out multi-dimensional data enhancement on time sequence data of each feature in the initial time sequence data set after the noise reduction processing, so as to obtain a final time sequence data set.
5. A time series data prediction device based on the time series data prediction method for pulmonary nodule onset according to claim 1, characterized in that: comprising
A time sequence data set adopts a matrix structure taking time sequence data of each characteristic as a column vector;
the time sequence feature translation interaction module is used for carrying out data translation and interaction processing on each feature column vector in the time sequence data set to obtain an initial time sequence data set;
the data noise reduction module is used for carrying out noise reduction processing on the initial time sequence data set;
the data enhancement module is used for carrying out data enhancement on the initial time sequence data set after noise reduction to obtain a final time sequence data set;
the prediction module is used for predicting the final time sequence data set.
6. The apparatus for predicting time series data for a pulmonary nodule onset according to claim 5, wherein: the data denoising module adopts an EEMD integrated empirical mode decomposition method to perform data denoising processing on an initial time sequence data set, the data enhancement module adopts TimeGAN to generate an initial time sequence data set subjected to counternetwork denoising to perform data enhancement processing, and the prediction module adopts an extreme random tree to predict a final time sequence data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311409606.6A CN117393149A (en) | 2023-10-27 | 2023-10-27 | Time sequence data prediction method for lung nodule pathogenesis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311409606.6A CN117393149A (en) | 2023-10-27 | 2023-10-27 | Time sequence data prediction method for lung nodule pathogenesis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117393149A true CN117393149A (en) | 2024-01-12 |
Family
ID=89440550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311409606.6A Pending CN117393149A (en) | 2023-10-27 | 2023-10-27 | Time sequence data prediction method for lung nodule pathogenesis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117393149A (en) |
-
2023
- 2023-10-27 CN CN202311409606.6A patent/CN117393149A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110674604B (en) | Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM | |
CN108399428B (en) | Triple loss function design method based on trace ratio criterion | |
CN111079931A (en) | State space probabilistic multi-time-series prediction method based on graph neural network | |
CN111506814B (en) | Sequence recommendation method based on variational self-attention network | |
CN110889450B (en) | Super-parameter tuning and model construction method and device | |
CN112884236B (en) | Short-term load prediction method and system based on VDM decomposition and LSTM improvement | |
CN111222847A (en) | Open-source community developer recommendation method based on deep learning and unsupervised clustering | |
CN115051929A (en) | Network fault prediction method and device based on self-supervision target perception neural network | |
CN117033657A (en) | Information retrieval method and device | |
CN110415261A (en) | A kind of the expression animation conversion method and system of subregion training | |
CN114091429A (en) | Text abstract generation method and system based on heterogeneous graph neural network | |
KR20220066554A (en) | Method, apparatus and computer program for buildding knowledge graph using qa model | |
CN110288002B (en) | Image classification method based on sparse orthogonal neural network | |
CN112231455A (en) | Machine reading understanding method and system | |
CN114610871B (en) | Information system modeling analysis method based on artificial intelligence algorithm | |
CN116450827A (en) | Event template induction method and system based on large-scale language model | |
CN116737943A (en) | News field-oriented time sequence knowledge graph link prediction method | |
CN116467466A (en) | Knowledge graph-based code recommendation method, device, equipment and medium | |
CN117393149A (en) | Time sequence data prediction method for lung nodule pathogenesis | |
CN116166642A (en) | Spatio-temporal data filling method, system, equipment and medium based on guide information | |
CN115859048A (en) | Noise processing method and device for partial discharge signal | |
CN114529096A (en) | Social network link prediction method and system based on ternary closure graph embedding | |
CN114611990A (en) | Method and device for evaluating contribution rate of element system of network information system | |
CN112183848B (en) | Power load probability prediction method based on DWT-SVQR integration | |
CN114357160A (en) | Early rumor detection method and device based on generation propagation structure characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |