CN111143435B - Medicine cloud platform big data abnormity online early warning method based on statistical generation model - Google Patents

Medicine cloud platform big data abnormity online early warning method based on statistical generation model Download PDF

Info

Publication number
CN111143435B
CN111143435B CN201911379506.7A CN201911379506A CN111143435B CN 111143435 B CN111143435 B CN 111143435B CN 201911379506 A CN201911379506 A CN 201911379506A CN 111143435 B CN111143435 B CN 111143435B
Authority
CN
China
Prior art keywords
early warning
data
time
time sequence
gaussian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911379506.7A
Other languages
Chinese (zh)
Other versions
CN111143435A (en
Inventor
张宸宇
陈海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Zedaxin Pharmaceutical Alliance Information Technology Co ltd
Original Assignee
Hangzhou Zedaxin Pharmaceutical Alliance Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Zedaxin Pharmaceutical Alliance Information Technology Co ltd filed Critical Hangzhou Zedaxin Pharmaceutical Alliance Information Technology Co ltd
Priority to CN201911379506.7A priority Critical patent/CN111143435B/en
Publication of CN111143435A publication Critical patent/CN111143435A/en
Application granted granted Critical
Publication of CN111143435B publication Critical patent/CN111143435B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The invention discloses a medicine cloud platform big data abnormity online early warning method based on a statistic generation model. For searching of the abnormal early warning samples, the method adopts an online mixed Gaussian statistic generation model, the model fits the probability distribution of the full life cycle of the medical data, the occurrence probability of the samples can be calculated for real-time sequence samples, and low-probability sequences in the samples are selected as the early warning samples, so that the large data abnormal online early warning of the medical cloud platform is realized.

Description

Medicine cloud platform big data abnormity online early warning method based on statistical generation model
Technical Field
The invention relates to a big data abnormity judgment and early warning method for a medicine cloud platform, in particular to a big data abnormity judgment and early warning method for the medicine cloud platform based on a statistic generation model.
Background
A large amount of drug manufacturing, storing and circulating data and patient medication habits and mode data are stored in a medicine cloud platform, the data can reflect the space-time distribution characteristics and future development trends of various drugs and associated diseases, and an industry worker may concern about the variation of a certain type of drugs and a certain brand of drugs in space-time distribution or find potential causal relationships among all the variations. In the presence of massive large data, the conventional dependence regular report can not meet the industry requirements in terms of timeliness and operability, and therefore the method needs to be realized by means of a space-time large data mining algorithm.
At present, feature extraction and abnormal sample search of main difficult data of an outlier mining technology of spatio-temporal event type data are realized, wherein the former refers to a method for filtering massive original data and extracting key feature points, and a PLS (partial line segment) and a variant algorithm thereof are generally adopted; the latter uses Dynamic Time Window (DTW) or clustering method to find statistically distant samples as abnormal samples based on various distance definitions in euclidean space. Because data changes of production and manufacturing, logistics, regional circulation and the like in the field of medicine are gentle, feature points filtered and extracted by the conventional method are still too dense, a large number of similar repeated features are reserved, and the feature extraction cannot improve the algorithm execution efficiency; the method of adopting dynamic time window or clustering depends on the reasonability of the distance measurement definition given to the sample sequence, and no ideal distance measurement method exists at present for the medicine cloud platform data.
Disclosure of Invention
The invention aims to provide a medicine cloud platform big data abnormity online early warning method based on a statistic generation model aiming at the defects of the prior art, and the method adopts a feature point filtering method with smooth direction, so that a large amount of mild space-time feature data can be removed, and a small amount of feature points are reserved; for searching abnormal early warning samples, the method provides an online Gaussian mixture statistics generation model which fits the probability distribution characteristics of the full life cycle of medical data, can calculate the occurrence probability of real-time sequence samples, and selects low-probability sequences as early warning samples.
The purpose of the invention is realized by the following technical scheme: a medicine cloud platform big data abnormity online early warning method based on a statistic generation model comprises the following steps:
(1) feature filtering, including affine transformation and direction smoothing filtering, as follows:
(1.1) the medicine cloud space-time data consists of a fixed-length feature vector time sequence, and the feature vector at the time t is set as Dt=<dt1,dt2,...,dtpLong, then D ═<D1,D2,...DT>Forming a sequence segment, and T is the maximum value of the sequence segment.
(1.2) performing affine transformation on each feature vector to map the feature vector to a p-dimensional finite space, and recording the feature vector at the time t after the affine transformation as D't
(1.3) performing feature filtering in the mapped pixel space, wherein the specific process is as follows:
(1.3.1) input: time sequence segment D ═<D1,D2,...DT>(ii) a Affine-transformed time-series segment D '< D'1,D‘2,...,D‘T>;
And (3) outputting: filtered time-series fragment DA ═ DAr1,Dar2,...,Dark>, where r1, r 2.. rk ∈ {1, 2.,. T }, and k ≦ T;
(1.3.2) sequentially traversing each component D 'in D'i(i=1,2,...,T);
(1.3.2.1) if i ═ 1 or i ═ T, then D will be addediAdding into DA;
(1.3.2.2) calculate vector D'i-1And D'iA Euclidean distance between them, if the Euclidean distance is greater than a distance threshold minDis, D is determinediAdding into DA.
(1.4) directional smoothing filtration: firstly, searching a weighted main direction of a time sequence segment, and then filtering according to the weighted main direction, wherein the specific process comprises the following steps:
(1.4.1) input: the time sequence fragment DA after the last step of filtering; and (3) outputting: the direction is smoothed to obtain a filtered time sequence segment DA';
(1.4.2) mixing Dar1Adding into DA';
(1.4.3) defining the value of variable index as r1 and the value of lastAngle as-1;
(1.4.4) sequentially traversing each component Da in the DAri(i=2,...,k-1);
(1.4.4.1) calculation from DaindexTo DariIs marked as DISri
(1.4.4.2) calculation from DaindexTo DariWeighted Angle of (1), denoted as Angleri
(1.4.4.3) if lastAngle has a value not equal to-1, and lastAngle and AngleriThe absolute value of the difference between is greater than
Figure BDA0002341908980000021
Then Da will beriAdding the sample into DA', and making index value be ri, otherwise, filtering the point;
(1.4.4.4) let lastAngle be equalri
(1.4.5) finally, the DarkAdded to DA'.
(2) And (3) calculating a statistical generation model: generating a probability distribution model of the time sequence segment based on historical data, wherein the probability distribution of the time sequence segment is assumed to be a Gaussian mixture function in a priori mode, and the probability distribution model is defined as follows:
Figure BDA0002341908980000022
where M is the number of Gaussian components in the Gaussian mixture function, kiIs the weight of the ith Gaussian component and satisfies
Figure BDA0002341908980000023
N(D|uii) Is the ith Gaussian function, uiIs the mean of the ith Gaussian component, sigmaiA covariance matrix of the ith Gaussian component; a real-time online learning method is adopted, and a Gaussian mixture model is dynamically corrected along with the increase of data, and the specific process is as follows:
(2.1) initial M is in [1,5 ]]Taking values, and selecting N time sequence segments D from historical data(1),D(2),...D(N)An initial mixture gaussian model is generated using standard EM algorithms.
(2.2) continuously updating the initial Gaussian mixture model along with the arrival of new time sequence fragment data, wherein the updating process is as follows:
(2.2.1) wait for the new time series fragment data to reach R, and mark as ND(1),ND(2),...ND(R)
(2.2.2) let j be 1, L { }, and let H be the current mixed gaussian model;
(2.2.3)E(j)={E1,E2.,..,EM}={N(ND(j)|uii) I | (1, 2., M } }, i.e., ND for each newly arrived fragment data ND(j)Calculating the value of each Gaussian component;
(2.2.4) pairs of E(j)Carrying out normalization processing;
(2.2.5)I=argmax(E(j)),V=max(E(j));
(2.2.6) if V>0.5, then L ═ U { ND-(j)Else, executing step (2.2.8);
(2.2.7) if | L | > is equal to N, performing mixed gaussian clustering on all data in L by adopting an EM algorithm to obtain a new model HL, making H equal to H ═ HL, and making L equal to { };
(2.2.8) mixing ND(j)Classifying the I-th Gaussian component in the H, and recalculating the average value of the I-th Gaussian component;
(2.2.9) j equals j +1, if j > R, the algorithm ends, otherwise go back to step (2.2.3).
(3) And (5) early warning and judgment. And if the length of the set L is always smaller than N after the T batches of new data arrive, starting early warning judgment and early warning the small-probability time sequence segments.
Further, in the step (1.2), affine transformation is performed on each feature vector to map the feature vector to a p-dimensional finite space, and the maximum length of each dimension is set as LiI belongs to {1,2,. eta., p }, and the value range of each dimension is [0, L ]i](ii) a Feature vector of affine transformation at time t is recorded as D'tThen the affine transformation is defined by the following formula:
Figure BDA0002341908980000031
wherein d'ti(i ═ 1, 2.. multidot.p) is D'tThe ith dimension component of (1).
Further, in the step (1.4.4.2), Angle is weightedriThe calculation formula of (2) is as follows:
Figure BDA0002341908980000032
in the above formula, x represents a dot product operation of vectors, and d represents an euclidean distance between two vectors.
Further, in the step (2.2.8), the average value of the I-th component is recalculated according to the following formula:
Figure BDA0002341908980000041
further, in the step (3), the early warning determination method includes substituting each new time sequence fragment data into the gaussian mixture model, and if the calculated value is less than 0.1, indicating that a small probability time sequence fragment occurs, early warning the time sequence fragment.
The invention has the beneficial effects that:
1. the method realizes the filtering of the sequence fragment data through a two-step filtering method comprising affine transformation and direction smoothing filtering, thereby removing similar points in the sequence fragment data, reserving a small number of characteristic points, reducing the analysis data volume and simultaneously providing a data basis for a statistic generation model.
2. And an online Gaussian mixture statistic generation model is further adopted, and the model fits the probability distribution of the time sequence fragment data, so that the capacity of estimating the occurrence probability of the time sequence fragment and early warning is realized.
Drawings
FIG. 1 is a graph of the characteristic filtering effect of an embodiment of the present invention.
FIG. 2 is a diagram illustrating a distribution of characteristics of time series data according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
The invention provides a statistical generation model-based online early warning method for big data abnormity of a medicine cloud platform, which comprises the following steps:
(1) feature filtering method
(1.1) the medicine cloud space-time data consists of a fixed-length feature vector time sequence, and the feature vector at the time t is set as Dt=<dt1,dt2,...,dtpLong, then D ═<D1,D2,...DT>Forming a sequence segment, and T is the maximum value of the sequence segment.
(1.2) affine transforming each feature vector to map it to a p-dimensional finite space, the maximum length of each dimension being LiI belongs to {1,2,. eta., p }, and the value range of each dimension is [0, L ]i](ii) a Feature vector of affine transformation at time t is recorded as D'tThen the affine transformation is defined by the following formula:
Figure BDA0002341908980000051
wherein d'ti(i ═ 1, 2.. multidot.p) is D'tThe ith dimension component of (1).
(1.3) affine transformation converts the feature vector into an artifact pixel space, points which are too close to each other in the space have strong similarity, and only one of the points is reserved, so that the purpose of feature filtering is achieved; the specific process is as follows:
(1.3.1) input: time sequence segment D ═<D1,D2,...DT>(ii) a Affine-transformed time-series segment D '< D'1,D‘2,...,D‘T>;
And (3) outputting: filtered time-series fragment DA ═ DAr1,Dar2,...,Dark>, where r1, r 2.. rk ∈ {1, 2.,. T }, and k ≦ T;
(1.3.2) sequentially traversing each component D 'in D'i(i=1,2,...,T);
(1.3.2.1) if i ═ 1 or i ═ T, then D will be addediAdding into DA;
(1.3.2.2) calculate vector D'i-1And D'iThe Euclidean distance therebetween, if EuclideanIf the distance is greater than the distance threshold minDis, D is setiAdded to DA, minDis is usually in [5,25 ]]Taking values in between.
(1.4) direction smoothing filtering, wherein the filtering method considers the included angle of the front and rear eigenvectors, and is different from other smoothing methods in that the direction smoothing firstly searches the weighted main direction of a time sequence segment and carries out filtering according to the weighted main direction; the method comprises the following steps:
(1.4.1) input: the time sequence fragment DA after the last step of filtering; and (3) outputting: the direction is smoothed to obtain a filtered time sequence segment DA';
(1.4.2) mixing Dar1Adding into DA';
(1.4.3) defining the value of variable index as r1 and the value of lastAngle as-1;
(1.4.4) sequentially traversing each component Da in the DAri(i=2,...,k-1);
(1.4.4.1) calculation from DaindexTo DariIs marked as DISri
(1.4.4.2) calculation from DaindexTo DariWeighted Angle of (1), denoted as AngleriThe calculation formula is as follows:
Figure BDA0002341908980000052
in the formula, x represents the dot product operation of the vectors, and d represents the Euclidean distance between the two vectors;
(1.4.4.3) if lastAngle has a value not equal to-1, and lastAngle and AngleriThe absolute value of the difference between is greater than
Figure BDA0002341908980000053
Then Da will beriAdding the sample into DA', and making index value be ri, otherwise, filtering the point;
(1.4.4.4) let lastAngle be equalri
(1.4.5) finally, the DarkAdded to DA'.
(2) The statistical generation model calculation method generates a probability distribution model of a time sequence segment based on historical data, wherein the probability distribution of the time sequence segment is assumed to be a Gaussian mixture function in a priori mode and is defined as follows:
Figure BDA0002341908980000061
where M is the number of Gaussian components in the Gaussian mixture function, kiIs the weight of the ith Gaussian component and satisfies
Figure BDA0002341908980000062
N(D|uii) Is the ith Gaussian function, uiIs the mean of the ith Gaussian component, sigmaiIs the covariance matrix of the ith gaussian component. Where M and all ki,uiiAre unknown and need to be learned through historical data. In consideration of the fact that system data continuously increases and changes in practical application, a real-time online learning method is designed, a Gaussian mixture model can be dynamically corrected along with the increase of the data, and the specific process is as follows:
(2.1) initial M is in [1,5 ]]Taking values, and selecting N time sequence segments D from historical data(1),D(2),...D(N)An initial mixture gaussian model is generated using standard EM algorithms.
(2.2) continuously updating the initial Gaussian mixture model along with the arrival of new time sequence fragment data, wherein the updating process is as follows:
(2.2.1) wait for the new time series fragment data to reach R, and mark as ND(1),ND(2),...ND(R)
(2.2.2) let j be 1, L { }, and let H be the current mixed gaussian model;
(2.2.3)E(j)={E1,E2.,..,EM}={N(ND(j)|uii) I | (1, 2., M } }, i.e., ND for each newly arrived fragment data ND(j)Calculating the value of each Gaussian component;
(2.2.4) pairs of E(j)And (3) carrying out normalization treatment:
E(j)={(E1-min(E(j)))/(max(E(j))-min(E(j))),..,(EM-min(E(j)))/(max(E(j))-min(E(j)) ) }, min and max are functions for solving the minimum value and the maximum value respectively;
(2.2.5)I=argmax(E(j)),V=max(E(j));
(2.2.6) if V>0.5, then L ═ U { ND-(j)Else, executing step (2.2.8);
(2.2.7) if | L | > is equal to N, performing mixed gaussian clustering on all data in L by adopting an EM algorithm to obtain a new model HL, making H equal to H ═ HL, and making L equal to { };
(2.2.8) mixing ND(j)The I-th Gaussian component in H is included, and the mean value of the I-th component is recalculated according to the following formula:
Figure BDA0002341908980000063
(2.2.9) j equals j +1, if j > R, the algorithm ends, otherwise go back to step (2.2.3).
(3) And (5) early warning and judgment. If the length of the set L is always smaller than N after T batches of new data (T usually takes 2R-10R) arrive, the early warning judgment process can be started. The judgment method comprises the steps of substituting each new time sequence fragment data into a Gaussian mixture model, and if a calculated value is smaller than 0.1, indicating that a small-probability time sequence fragment appears, carrying out early warning on the time sequence fragment.
An example of a specific application of the present invention is given below. Some acute infectious diseases have the unfavorable characteristics of fast diffusion, long incubation period and easy misdiagnosis, for example, tuberculosis of the B infectious disease is spread by droplets, the incubation period is 2-3 weeks after infection, and the viral cold is easily misdiagnosed, which brings great difficulty to the prevention and treatment of the infectious diseases, and particularly, when the infectious diseases are diffused rapidly on a large scale, timely early warning is necessary.
By adopting the method, the regional dosage conditions of the anti-tuberculosis drugs and the antiviral cold drugs, such as ethambutol, quinolone, loratadine and the like, are monitored on line, a statistical generation model is established to search for the small-probability time sequence abnormal data, and the early warning capability of the spread of potential diseases can be realized. The method comprises the following steps:
1. the 7-year data of 34 anti-tubercular drugs and antiviral cold drugs in a certain area are selected, and in order to realize effective monitoring, the hourly dosage is calculated by taking the hour as a basic unit, and the 24-hour dosage is taken as a minimum time sequence segment, and the total number of the data items is 34, 7, 365 and 86870 time sequence segments, and 34, 7, 365 and 24 is 2084880.
2. Since the dosage data can be influenced by various external factors such as population, economy and the like, the data needs to be normalized to eliminate the influence of the factors. The specific method is that the mean value and the standard deviation of the whole year are calculated by taking the year as a unit, and the mean value is subtracted from each data item and then divided by the standard deviation to be taken as normalized data.
3. Time series segments (12 minimum time series segments) in units of years are subjected to feature filtering by using the feature filtering method of the present invention, and fig. 1 shows the difference before and after the filtering. The filtering method can keep the direction change characteristics of the time sequence data and delete the data items with gentle change.
4. The method of the invention is further adopted to estimate the probability distribution of the time sequence segments, and the basic unit of estimation is the minimum time sequence segment. The probability distribution data is shown in fig. 2.
All time series segments with probability density values below 0.1 were selected, two in this example, in which the dosages of quinolone in the region of month 11 showed a special case of a significant increase and decrease beyond the dosage of quinolone in the past year (marked with (1) in fig. 2), the average probability density value of this time series segment was 0.061, while the dosages of cycloserine in the same month showed a tendency of a sudden increase in the past year (marked with (2) in fig. 2), and the average probability density value of this time series segment was 0.0396. The abnormity of the two medicines can be visually displayed in a visual mode according to the difference of the probability density values, and the early warning is automatically given to related industry management personnel, so that the management personnel can be helped to acquire more valuable data from a large amount of medicine information.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (5)

1. A medicine cloud platform big data abnormity online early warning method based on a statistic generation model is characterized by comprising the following steps:
(1) collecting medicine cloud time-space data as input of an early warning method; the medicine cloud time-space data consists of a characteristic vector time sequence with a fixed length, and comprises the medicine dosage and the medicine taking time data of a patient, and the characteristic vector of the medicine taking at the moment t is set as Dt=<dt1,dt2,...,dtpLong, then D ═<D1,D2,...DT>Forming a sequence segment, wherein T is the maximum value of the sequence segment;
(2) feature filtering, including affine transformation and direction smoothing filtering, as follows:
(2.1) performing affine transformation on each feature vector to map the feature vector to a p-dimensional finite space, and recording the feature vector at the time t after the affine transformation as D't
(2.2) performing feature filtering in the mapped pixel space, wherein the specific process is as follows:
(2.2.1) input: time-sequential fraction of administration D ═<D1,D2,...DT>(ii) a Affine-transformed time-series segment D '< D'1,D‘2,...,D‘T>;
And (3) outputting: filtered time-series fragment DA ═ DAr1,Dar2,...,Dark>, where r1, r 2.. rk ∈ {1, 2.,. T }, and k ≦ T;
(2.2.2) sequentially traversing each component D 'in D'i(i=1,2,...,T);
(2.2.2.1) if i is 1 or i is T, then D is addediAdding into DA;
(2.2.2.2) calculate vector D'i-1And D'iA Euclidean distance between them, if the Euclidean distance is greater than a distance threshold minDis, D is determinediAdding into DA;
(2.3) directional smoothing filtration: firstly, searching a weighted main direction of a time sequence segment, and then filtering according to the weighted main direction, wherein the specific process comprises the following steps:
(2.3.1) input: the time sequence fragment DA after the last step of filtering; and (3) outputting: the direction is smoothed to obtain a filtered time sequence segment DA';
(2.3.2) mixing Dar1Adding into DA';
(2.3.3) defining the value of variable index as r1 and the value of lastAngle as-1;
(2.3.4) sequentially traversing each component Da in the DAri(i=2,...,k-1);
(2.3.4.1) calculation from DaindexTo DariIs marked as DISri
(2.3.4.2) calculation from DaindexTo DariWeighted Angle of (1), denoted as Angleri
(2.3.4.3) if lastAngle has a value not equal to-1, and lastAngle and AngleriThe absolute value of the difference between is greater than
Figure FDA0002812232120000011
Then Da will beriAdding into DA' and making index value be ri, otherwise, said DariIs filtered;
(2.3.4.4) let lastAngle be equalri
(2.3.5) finally, the DarkAdding into DA';
(3) and (3) calculating a statistical generation model: generating a probability distribution model of the time sequence segment based on the medicine cloud space-time historical data in the step (1), wherein the probability distribution of the time sequence segment is assumed to be a Gaussian mixture function in a priori mode, and the probability distribution model is defined as follows:
Figure FDA0002812232120000021
where M is the number of Gaussian components in the Gaussian mixture function, kiIs the weight of the ith Gaussian component and satisfies
Figure FDA0002812232120000022
N(D|uii) Is the ith Gaussian function, uiIs the mean of the ith Gaussian component, sigmaiA covariance matrix of the ith Gaussian component; a real-time online learning method is adopted, and a Gaussian mixture model is dynamically corrected along with the increase of data, and the specific process is as follows:
(3.1) initial M is in [1,5 ]]Taking values, and selecting N time sequence segments D from historical data(1),D(2),...D(N)Generating an initial Gaussian mixture model by using a standard EM algorithm;
(3.2) continuously updating the initial Gaussian mixture model along with the arrival of new time sequence fragment data, wherein the updating process is as follows:
(3.2.1) wait for the new time series fragment data to reach R, and mark as ND(1),ND(2),...ND(R)
(3.2.2) let j equal 1, L equal { }, and let H be the current mixed gaussian model;
(3.2.3)E(j)={E1,E2.,..,EM}={N(ND(j)|uii) I | (1, 2., M } }, i.e., ND for each newly arrived fragment data ND(j)Calculating the value of each Gaussian component;
(3.2.4) pairs of E(j)Carrying out normalization processing;
(3.2.5)I=argmax(E(j)),V=max(E(j));
(3.2.6) if V>0.5, then L ═ U { ND-(j)Else, executing step (2.2.8);
(3.2.7) if | L | > is equal to N, performing mixed gaussian clustering on all data in L by adopting an EM algorithm to obtain a new model HL, making H equal to H ═ HL, and making L equal to { };
(3.2.8) mixing ND(j)Classifying the I-th Gaussian component in the H, and recalculating the average value of the I-th Gaussian component;
(3.2.9) j is j +1, if j > R, the algorithm ends, otherwise go back to step (2.2.3);
(4) early warning judgment; and if the length of the set L is always smaller than N after the T batches of new data arrive, starting early warning judgment and early warning the small-probability time sequence segments.
2. The medicine cloud platform big data abnormity online early warning method based on the statistic generation model as claimed in claim 1, wherein in the step (2.1), affine transformation is performed on each feature vector to enable each feature vector to be mapped to a p-dimensional finite space, and the maximum length of each dimension is set as LiI belongs to {1,2,. eta., p }, and the value range of each dimension is [0, L ]i](ii) a Feature vector of affine transformation at time t is recorded as D'tThen the affine transformation is defined by the following formula:
Figure FDA0002812232120000031
wherein d'ti(i ═ 1, 2.. multidot.p) is D'tThe ith dimension component of (1).
3. The medicine cloud platform big data abnormity online early warning method based on statistical generation model as claimed in claim 1, wherein in the step (2.3.4.2), Angle is weightedriThe calculation formula of (2) is as follows:
Figure FDA0002812232120000032
in the above formula, x represents a dot product operation of vectors, and d represents an euclidean distance between two vectors.
4. The medicine cloud platform big data abnormity online early warning method based on the statistical generation model as claimed in claim 1, wherein in the step (2.2.8), the mean value of the I-th component is recalculated according to the following formula:
Figure FDA0002812232120000033
5. the medicine cloud platform big data abnormity online early warning method based on the statistic generation model as claimed in claim 1, wherein in the step (3), the early warning determination method is to substitute each new time sequence fragment data into the Gaussian mixture model, and if the calculated value is less than 0.1, it indicates that a small probability time sequence fragment occurs, the time sequence fragment is early warned.
CN201911379506.7A 2019-12-27 2019-12-27 Medicine cloud platform big data abnormity online early warning method based on statistical generation model Active CN111143435B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911379506.7A CN111143435B (en) 2019-12-27 2019-12-27 Medicine cloud platform big data abnormity online early warning method based on statistical generation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911379506.7A CN111143435B (en) 2019-12-27 2019-12-27 Medicine cloud platform big data abnormity online early warning method based on statistical generation model

Publications (2)

Publication Number Publication Date
CN111143435A CN111143435A (en) 2020-05-12
CN111143435B true CN111143435B (en) 2021-04-13

Family

ID=70521108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911379506.7A Active CN111143435B (en) 2019-12-27 2019-12-27 Medicine cloud platform big data abnormity online early warning method based on statistical generation model

Country Status (1)

Country Link
CN (1) CN111143435B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073883A (en) * 2009-11-19 2011-05-25 夏普株式会社 Method and equipment for detecting subsequence in time sequence data
EP3244339A1 (en) * 2016-05-10 2017-11-15 Aircloak GmbH Systems and methods for anonymized statistical database queries using noise elements
CN108198596A (en) * 2018-03-23 2018-06-22 顾泰来 A kind of medical institutions' Drug use administration device, terminal and method
CN109858748A (en) * 2018-12-26 2019-06-07 航天信息股份有限公司 It eats medicine and supervises intelligent terminal acquisition system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205112A (en) * 2015-09-01 2015-12-30 西安交通大学 System and method for excavating abnormal features of time series data
CN106992900A (en) * 2016-01-20 2017-07-28 北京国双科技有限公司 The method and intelligent early-warning notification platform of monitoring and early warning
WO2017146930A1 (en) * 2016-02-22 2017-08-31 Rapiscan Systems, Inc. Systems and methods for detecting threats and contraband in cargo
US10324961B2 (en) * 2017-01-17 2019-06-18 International Business Machines Corporation Automatic feature extraction from a relational database
CN107180076B (en) * 2017-04-18 2018-08-24 中国检验检疫科学研究院 Pesticide residue visual method based on high resolution mass spectrum+internet+geography information
US11741114B2 (en) * 2017-12-19 2023-08-29 ExxonMobil Technology and Engineering Company Data analysis platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073883A (en) * 2009-11-19 2011-05-25 夏普株式会社 Method and equipment for detecting subsequence in time sequence data
EP3244339A1 (en) * 2016-05-10 2017-11-15 Aircloak GmbH Systems and methods for anonymized statistical database queries using noise elements
CN108198596A (en) * 2018-03-23 2018-06-22 顾泰来 A kind of medical institutions' Drug use administration device, terminal and method
CN109858748A (en) * 2018-12-26 2019-06-07 航天信息股份有限公司 It eats medicine and supervises intelligent terminal acquisition system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于J2EE的医药销售管理系统的设计与实现;陈福元;《中国优秀硕士学位论文全文数据库信息科技辑》;20180715;第1-50页 *

Also Published As

Publication number Publication date
CN111143435A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
Nguyen et al. $\mathtt {Deepr} $: a convolutional net for medical records
CN106846361B (en) Target tracking method and device based on intuitive fuzzy random forest
Zhang et al. Clustering-based missing value imputation for data preprocessing
Sadiq et al. Mining anomalies in medicare big data using patient rule induction method
CN110929752B (en) Grouping method based on knowledge driving and data driving and related equipment
Thakran et al. Unsupervised outlier detection in streaming data using weighted clustering
CN112992370B (en) Unsupervised electronic medical record-based medical behavior compliance assessment method
CN113255841B (en) Clustering method, clustering device and computer readable storage medium
Xie et al. Retinal vascular image segmentation using genetic algorithm Plus FCM clustering
Winter et al. Fast indexing strategies for robust image hashes
US8977061B2 (en) Merging face clusters
WO2023029347A1 (en) Multi-source data-based disease early warning method and apparatus, device, and storage medium
CN114218009A (en) Time series abnormal value detection method, device, equipment and storage medium
WO2023082641A1 (en) Electronic archive generation method and apparatus, and terminal device and storage medium
Jaiswal et al. Deep learned cumulative attribute regression
CN110083724B (en) Similar image retrieval method, device and system
CN111143435B (en) Medicine cloud platform big data abnormity online early warning method based on statistical generation model
Külah et al. COVID-19 forecasting using shifted Gaussian Mixture Model with similarity-based estimation
US11031044B1 (en) Method, system and computer program product for self-learned and probabilistic-based prediction of inter-camera object movement
Settipalli et al. Predictive and adaptive drift analysis on decomposed healthcare claims using ART based topological clustering
US20230068453A1 (en) Methods and systems for determining and displaying dynamic patient readmission risk and intervention recommendation
Lin et al. Proximity-aware hierarchical clustering of unconstrained faces
CN111507424B (en) Data processing method and device
CN109872183A (en) Intelligent Service evaluation method, computer readable storage medium and terminal device
Abubakar et al. A convolutional neural network with K-neareast neighbor for image classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant