CN111143435B - Medicine cloud platform big data abnormity online early warning method based on statistical generation model - Google Patents
Medicine cloud platform big data abnormity online early warning method based on statistical generation model Download PDFInfo
- Publication number
- CN111143435B CN111143435B CN201911379506.7A CN201911379506A CN111143435B CN 111143435 B CN111143435 B CN 111143435B CN 201911379506 A CN201911379506 A CN 201911379506A CN 111143435 B CN111143435 B CN 111143435B
- Authority
- CN
- China
- Prior art keywords
- early warning
- data
- time
- time sequence
- gaussian
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Abstract
The invention discloses a medicine cloud platform big data abnormity online early warning method based on a statistic generation model. For searching of the abnormal early warning samples, the method adopts an online mixed Gaussian statistic generation model, the model fits the probability distribution of the full life cycle of the medical data, the occurrence probability of the samples can be calculated for real-time sequence samples, and low-probability sequences in the samples are selected as the early warning samples, so that the large data abnormal online early warning of the medical cloud platform is realized.
Description
Technical Field
The invention relates to a big data abnormity judgment and early warning method for a medicine cloud platform, in particular to a big data abnormity judgment and early warning method for the medicine cloud platform based on a statistic generation model.
Background
A large amount of drug manufacturing, storing and circulating data and patient medication habits and mode data are stored in a medicine cloud platform, the data can reflect the space-time distribution characteristics and future development trends of various drugs and associated diseases, and an industry worker may concern about the variation of a certain type of drugs and a certain brand of drugs in space-time distribution or find potential causal relationships among all the variations. In the presence of massive large data, the conventional dependence regular report can not meet the industry requirements in terms of timeliness and operability, and therefore the method needs to be realized by means of a space-time large data mining algorithm.
At present, feature extraction and abnormal sample search of main difficult data of an outlier mining technology of spatio-temporal event type data are realized, wherein the former refers to a method for filtering massive original data and extracting key feature points, and a PLS (partial line segment) and a variant algorithm thereof are generally adopted; the latter uses Dynamic Time Window (DTW) or clustering method to find statistically distant samples as abnormal samples based on various distance definitions in euclidean space. Because data changes of production and manufacturing, logistics, regional circulation and the like in the field of medicine are gentle, feature points filtered and extracted by the conventional method are still too dense, a large number of similar repeated features are reserved, and the feature extraction cannot improve the algorithm execution efficiency; the method of adopting dynamic time window or clustering depends on the reasonability of the distance measurement definition given to the sample sequence, and no ideal distance measurement method exists at present for the medicine cloud platform data.
Disclosure of Invention
The invention aims to provide a medicine cloud platform big data abnormity online early warning method based on a statistic generation model aiming at the defects of the prior art, and the method adopts a feature point filtering method with smooth direction, so that a large amount of mild space-time feature data can be removed, and a small amount of feature points are reserved; for searching abnormal early warning samples, the method provides an online Gaussian mixture statistics generation model which fits the probability distribution characteristics of the full life cycle of medical data, can calculate the occurrence probability of real-time sequence samples, and selects low-probability sequences as early warning samples.
The purpose of the invention is realized by the following technical scheme: a medicine cloud platform big data abnormity online early warning method based on a statistic generation model comprises the following steps:
(1) feature filtering, including affine transformation and direction smoothing filtering, as follows:
(1.1) the medicine cloud space-time data consists of a fixed-length feature vector time sequence, and the feature vector at the time t is set as Dt=<dt1,dt2,...,dtpLong, then D ═<D1,D2,...DT>Forming a sequence segment, and T is the maximum value of the sequence segment.
(1.2) performing affine transformation on each feature vector to map the feature vector to a p-dimensional finite space, and recording the feature vector at the time t after the affine transformation as D't。
(1.3) performing feature filtering in the mapped pixel space, wherein the specific process is as follows:
(1.3.1) input: time sequence segment D ═<D1,D2,...DT>(ii) a Affine-transformed time-series segment D '< D'1,D‘2,...,D‘T>;
And (3) outputting: filtered time-series fragment DA ═ DAr1,Dar2,...,Dark>, where r1, r 2.. rk ∈ {1, 2.,. T }, and k ≦ T;
(1.3.2) sequentially traversing each component D 'in D'i(i=1,2,...,T);
(1.3.2.1) if i ═ 1 or i ═ T, then D will be addediAdding into DA;
(1.3.2.2) calculate vector D'i-1And D'iA Euclidean distance between them, if the Euclidean distance is greater than a distance threshold minDis, D is determinediAdding into DA.
(1.4) directional smoothing filtration: firstly, searching a weighted main direction of a time sequence segment, and then filtering according to the weighted main direction, wherein the specific process comprises the following steps:
(1.4.1) input: the time sequence fragment DA after the last step of filtering; and (3) outputting: the direction is smoothed to obtain a filtered time sequence segment DA';
(1.4.2) mixing Dar1Adding into DA';
(1.4.3) defining the value of variable index as r1 and the value of lastAngle as-1;
(1.4.4) sequentially traversing each component Da in the DAri(i=2,...,k-1);
(1.4.4.1) calculation from DaindexTo DariIs marked as DISri;
(1.4.4.2) calculation from DaindexTo DariWeighted Angle of (1), denoted as Angleri;
(1.4.4.3) if lastAngle has a value not equal to-1, and lastAngle and AngleriThe absolute value of the difference between is greater thanThen Da will beriAdding the sample into DA', and making index value be ri, otherwise, filtering the point;
(1.4.4.4) let lastAngle be equalri;
(1.4.5) finally, the DarkAdded to DA'.
(2) And (3) calculating a statistical generation model: generating a probability distribution model of the time sequence segment based on historical data, wherein the probability distribution of the time sequence segment is assumed to be a Gaussian mixture function in a priori mode, and the probability distribution model is defined as follows:
where M is the number of Gaussian components in the Gaussian mixture function, kiIs the weight of the ith Gaussian component and satisfiesN(D|ui,Σi) Is the ith Gaussian function, uiIs the mean of the ith Gaussian component, sigmaiA covariance matrix of the ith Gaussian component; a real-time online learning method is adopted, and a Gaussian mixture model is dynamically corrected along with the increase of data, and the specific process is as follows:
(2.1) initial M is in [1,5 ]]Taking values, and selecting N time sequence segments D from historical data(1),D(2),...D(N)An initial mixture gaussian model is generated using standard EM algorithms.
(2.2) continuously updating the initial Gaussian mixture model along with the arrival of new time sequence fragment data, wherein the updating process is as follows:
(2.2.1) wait for the new time series fragment data to reach R, and mark as ND(1),ND(2),...ND(R);
(2.2.2) let j be 1, L { }, and let H be the current mixed gaussian model;
(2.2.3)E(j)={E1,E2.,..,EM}={N(ND(j)|ui,Σi) I | (1, 2., M } }, i.e., ND for each newly arrived fragment data ND(j)Calculating the value of each Gaussian component;
(2.2.4) pairs of E(j)Carrying out normalization processing;
(2.2.5)I=argmax(E(j)),V=max(E(j));
(2.2.6) if V>0.5, then L ═ U { ND-(j)Else, executing step (2.2.8);
(2.2.7) if | L | > is equal to N, performing mixed gaussian clustering on all data in L by adopting an EM algorithm to obtain a new model HL, making H equal to H ═ HL, and making L equal to { };
(2.2.8) mixing ND(j)Classifying the I-th Gaussian component in the H, and recalculating the average value of the I-th Gaussian component;
(2.2.9) j equals j +1, if j > R, the algorithm ends, otherwise go back to step (2.2.3).
(3) And (5) early warning and judgment. And if the length of the set L is always smaller than N after the T batches of new data arrive, starting early warning judgment and early warning the small-probability time sequence segments.
Further, in the step (1.2), affine transformation is performed on each feature vector to map the feature vector to a p-dimensional finite space, and the maximum length of each dimension is set as LiI belongs to {1,2,. eta., p }, and the value range of each dimension is [0, L ]i](ii) a Feature vector of affine transformation at time t is recorded as D'tThen the affine transformation is defined by the following formula:
wherein d'ti(i ═ 1, 2.. multidot.p) is D'tThe ith dimension component of (1).
Further, in the step (1.4.4.2), Angle is weightedriThe calculation formula of (2) is as follows:
in the above formula, x represents a dot product operation of vectors, and d represents an euclidean distance between two vectors.
Further, in the step (2.2.8), the average value of the I-th component is recalculated according to the following formula:
further, in the step (3), the early warning determination method includes substituting each new time sequence fragment data into the gaussian mixture model, and if the calculated value is less than 0.1, indicating that a small probability time sequence fragment occurs, early warning the time sequence fragment.
The invention has the beneficial effects that:
1. the method realizes the filtering of the sequence fragment data through a two-step filtering method comprising affine transformation and direction smoothing filtering, thereby removing similar points in the sequence fragment data, reserving a small number of characteristic points, reducing the analysis data volume and simultaneously providing a data basis for a statistic generation model.
2. And an online Gaussian mixture statistic generation model is further adopted, and the model fits the probability distribution of the time sequence fragment data, so that the capacity of estimating the occurrence probability of the time sequence fragment and early warning is realized.
Drawings
FIG. 1 is a graph of the characteristic filtering effect of an embodiment of the present invention.
FIG. 2 is a diagram illustrating a distribution of characteristics of time series data according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
The invention provides a statistical generation model-based online early warning method for big data abnormity of a medicine cloud platform, which comprises the following steps:
(1) feature filtering method
(1.1) the medicine cloud space-time data consists of a fixed-length feature vector time sequence, and the feature vector at the time t is set as Dt=<dt1,dt2,...,dtpLong, then D ═<D1,D2,...DT>Forming a sequence segment, and T is the maximum value of the sequence segment.
(1.2) affine transforming each feature vector to map it to a p-dimensional finite space, the maximum length of each dimension being LiI belongs to {1,2,. eta., p }, and the value range of each dimension is [0, L ]i](ii) a Feature vector of affine transformation at time t is recorded as D'tThen the affine transformation is defined by the following formula:
wherein d'ti(i ═ 1, 2.. multidot.p) is D'tThe ith dimension component of (1).
(1.3) affine transformation converts the feature vector into an artifact pixel space, points which are too close to each other in the space have strong similarity, and only one of the points is reserved, so that the purpose of feature filtering is achieved; the specific process is as follows:
(1.3.1) input: time sequence segment D ═<D1,D2,...DT>(ii) a Affine-transformed time-series segment D '< D'1,D‘2,...,D‘T>;
And (3) outputting: filtered time-series fragment DA ═ DAr1,Dar2,...,Dark>, where r1, r 2.. rk ∈ {1, 2.,. T }, and k ≦ T;
(1.3.2) sequentially traversing each component D 'in D'i(i=1,2,...,T);
(1.3.2.1) if i ═ 1 or i ═ T, then D will be addediAdding into DA;
(1.3.2.2) calculate vector D'i-1And D'iThe Euclidean distance therebetween, if EuclideanIf the distance is greater than the distance threshold minDis, D is setiAdded to DA, minDis is usually in [5,25 ]]Taking values in between.
(1.4) direction smoothing filtering, wherein the filtering method considers the included angle of the front and rear eigenvectors, and is different from other smoothing methods in that the direction smoothing firstly searches the weighted main direction of a time sequence segment and carries out filtering according to the weighted main direction; the method comprises the following steps:
(1.4.1) input: the time sequence fragment DA after the last step of filtering; and (3) outputting: the direction is smoothed to obtain a filtered time sequence segment DA';
(1.4.2) mixing Dar1Adding into DA';
(1.4.3) defining the value of variable index as r1 and the value of lastAngle as-1;
(1.4.4) sequentially traversing each component Da in the DAri(i=2,...,k-1);
(1.4.4.1) calculation from DaindexTo DariIs marked as DISri;
(1.4.4.2) calculation from DaindexTo DariWeighted Angle of (1), denoted as AngleriThe calculation formula is as follows:
in the formula, x represents the dot product operation of the vectors, and d represents the Euclidean distance between the two vectors;
(1.4.4.3) if lastAngle has a value not equal to-1, and lastAngle and AngleriThe absolute value of the difference between is greater thanThen Da will beriAdding the sample into DA', and making index value be ri, otherwise, filtering the point;
(1.4.4.4) let lastAngle be equalri;
(1.4.5) finally, the DarkAdded to DA'.
(2) The statistical generation model calculation method generates a probability distribution model of a time sequence segment based on historical data, wherein the probability distribution of the time sequence segment is assumed to be a Gaussian mixture function in a priori mode and is defined as follows:
where M is the number of Gaussian components in the Gaussian mixture function, kiIs the weight of the ith Gaussian component and satisfiesN(D|ui,Σi) Is the ith Gaussian function, uiIs the mean of the ith Gaussian component, sigmaiIs the covariance matrix of the ith gaussian component. Where M and all ki,ui,ΣiAre unknown and need to be learned through historical data. In consideration of the fact that system data continuously increases and changes in practical application, a real-time online learning method is designed, a Gaussian mixture model can be dynamically corrected along with the increase of the data, and the specific process is as follows:
(2.1) initial M is in [1,5 ]]Taking values, and selecting N time sequence segments D from historical data(1),D(2),...D(N)An initial mixture gaussian model is generated using standard EM algorithms.
(2.2) continuously updating the initial Gaussian mixture model along with the arrival of new time sequence fragment data, wherein the updating process is as follows:
(2.2.1) wait for the new time series fragment data to reach R, and mark as ND(1),ND(2),...ND(R);
(2.2.2) let j be 1, L { }, and let H be the current mixed gaussian model;
(2.2.3)E(j)={E1,E2.,..,EM}={N(ND(j)|ui,Σi) I | (1, 2., M } }, i.e., ND for each newly arrived fragment data ND(j)Calculating the value of each Gaussian component;
(2.2.4) pairs of E(j)And (3) carrying out normalization treatment:
E(j)={(E1-min(E(j)))/(max(E(j))-min(E(j))),..,(EM-min(E(j)))/(max(E(j))-min(E(j)) ) }, min and max are functions for solving the minimum value and the maximum value respectively;
(2.2.5)I=argmax(E(j)),V=max(E(j));
(2.2.6) if V>0.5, then L ═ U { ND-(j)Else, executing step (2.2.8);
(2.2.7) if | L | > is equal to N, performing mixed gaussian clustering on all data in L by adopting an EM algorithm to obtain a new model HL, making H equal to H ═ HL, and making L equal to { };
(2.2.8) mixing ND(j)The I-th Gaussian component in H is included, and the mean value of the I-th component is recalculated according to the following formula:
(2.2.9) j equals j +1, if j > R, the algorithm ends, otherwise go back to step (2.2.3).
(3) And (5) early warning and judgment. If the length of the set L is always smaller than N after T batches of new data (T usually takes 2R-10R) arrive, the early warning judgment process can be started. The judgment method comprises the steps of substituting each new time sequence fragment data into a Gaussian mixture model, and if a calculated value is smaller than 0.1, indicating that a small-probability time sequence fragment appears, carrying out early warning on the time sequence fragment.
An example of a specific application of the present invention is given below. Some acute infectious diseases have the unfavorable characteristics of fast diffusion, long incubation period and easy misdiagnosis, for example, tuberculosis of the B infectious disease is spread by droplets, the incubation period is 2-3 weeks after infection, and the viral cold is easily misdiagnosed, which brings great difficulty to the prevention and treatment of the infectious diseases, and particularly, when the infectious diseases are diffused rapidly on a large scale, timely early warning is necessary.
By adopting the method, the regional dosage conditions of the anti-tuberculosis drugs and the antiviral cold drugs, such as ethambutol, quinolone, loratadine and the like, are monitored on line, a statistical generation model is established to search for the small-probability time sequence abnormal data, and the early warning capability of the spread of potential diseases can be realized. The method comprises the following steps:
1. the 7-year data of 34 anti-tubercular drugs and antiviral cold drugs in a certain area are selected, and in order to realize effective monitoring, the hourly dosage is calculated by taking the hour as a basic unit, and the 24-hour dosage is taken as a minimum time sequence segment, and the total number of the data items is 34, 7, 365 and 86870 time sequence segments, and 34, 7, 365 and 24 is 2084880.
2. Since the dosage data can be influenced by various external factors such as population, economy and the like, the data needs to be normalized to eliminate the influence of the factors. The specific method is that the mean value and the standard deviation of the whole year are calculated by taking the year as a unit, and the mean value is subtracted from each data item and then divided by the standard deviation to be taken as normalized data.
3. Time series segments (12 minimum time series segments) in units of years are subjected to feature filtering by using the feature filtering method of the present invention, and fig. 1 shows the difference before and after the filtering. The filtering method can keep the direction change characteristics of the time sequence data and delete the data items with gentle change.
4. The method of the invention is further adopted to estimate the probability distribution of the time sequence segments, and the basic unit of estimation is the minimum time sequence segment. The probability distribution data is shown in fig. 2.
All time series segments with probability density values below 0.1 were selected, two in this example, in which the dosages of quinolone in the region of month 11 showed a special case of a significant increase and decrease beyond the dosage of quinolone in the past year (marked with (1) in fig. 2), the average probability density value of this time series segment was 0.061, while the dosages of cycloserine in the same month showed a tendency of a sudden increase in the past year (marked with (2) in fig. 2), and the average probability density value of this time series segment was 0.0396. The abnormity of the two medicines can be visually displayed in a visual mode according to the difference of the probability density values, and the early warning is automatically given to related industry management personnel, so that the management personnel can be helped to acquire more valuable data from a large amount of medicine information.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.
Claims (5)
1. A medicine cloud platform big data abnormity online early warning method based on a statistic generation model is characterized by comprising the following steps:
(1) collecting medicine cloud time-space data as input of an early warning method; the medicine cloud time-space data consists of a characteristic vector time sequence with a fixed length, and comprises the medicine dosage and the medicine taking time data of a patient, and the characteristic vector of the medicine taking at the moment t is set as Dt=<dt1,dt2,...,dtpLong, then D ═<D1,D2,...DT>Forming a sequence segment, wherein T is the maximum value of the sequence segment;
(2) feature filtering, including affine transformation and direction smoothing filtering, as follows:
(2.1) performing affine transformation on each feature vector to map the feature vector to a p-dimensional finite space, and recording the feature vector at the time t after the affine transformation as D't;
(2.2) performing feature filtering in the mapped pixel space, wherein the specific process is as follows:
(2.2.1) input: time-sequential fraction of administration D ═<D1,D2,...DT>(ii) a Affine-transformed time-series segment D '< D'1,D‘2,...,D‘T>;
And (3) outputting: filtered time-series fragment DA ═ DAr1,Dar2,...,Dark>, where r1, r 2.. rk ∈ {1, 2.,. T }, and k ≦ T;
(2.2.2) sequentially traversing each component D 'in D'i(i=1,2,...,T);
(2.2.2.1) if i is 1 or i is T, then D is addediAdding into DA;
(2.2.2.2) calculate vector D'i-1And D'iA Euclidean distance between them, if the Euclidean distance is greater than a distance threshold minDis, D is determinediAdding into DA;
(2.3) directional smoothing filtration: firstly, searching a weighted main direction of a time sequence segment, and then filtering according to the weighted main direction, wherein the specific process comprises the following steps:
(2.3.1) input: the time sequence fragment DA after the last step of filtering; and (3) outputting: the direction is smoothed to obtain a filtered time sequence segment DA';
(2.3.2) mixing Dar1Adding into DA';
(2.3.3) defining the value of variable index as r1 and the value of lastAngle as-1;
(2.3.4) sequentially traversing each component Da in the DAri(i=2,...,k-1);
(2.3.4.1) calculation from DaindexTo DariIs marked as DISri;
(2.3.4.2) calculation from DaindexTo DariWeighted Angle of (1), denoted as Angleri;
(2.3.4.3) if lastAngle has a value not equal to-1, and lastAngle and AngleriThe absolute value of the difference between is greater thanThen Da will beriAdding into DA' and making index value be ri, otherwise, said DariIs filtered;
(2.3.4.4) let lastAngle be equalri;
(2.3.5) finally, the DarkAdding into DA';
(3) and (3) calculating a statistical generation model: generating a probability distribution model of the time sequence segment based on the medicine cloud space-time historical data in the step (1), wherein the probability distribution of the time sequence segment is assumed to be a Gaussian mixture function in a priori mode, and the probability distribution model is defined as follows:
where M is the number of Gaussian components in the Gaussian mixture function, kiIs the weight of the ith Gaussian component and satisfiesN(D|ui,Σi) Is the ith Gaussian function, uiIs the mean of the ith Gaussian component, sigmaiA covariance matrix of the ith Gaussian component; a real-time online learning method is adopted, and a Gaussian mixture model is dynamically corrected along with the increase of data, and the specific process is as follows:
(3.1) initial M is in [1,5 ]]Taking values, and selecting N time sequence segments D from historical data(1),D(2),...D(N)Generating an initial Gaussian mixture model by using a standard EM algorithm;
(3.2) continuously updating the initial Gaussian mixture model along with the arrival of new time sequence fragment data, wherein the updating process is as follows:
(3.2.1) wait for the new time series fragment data to reach R, and mark as ND(1),ND(2),...ND(R);
(3.2.2) let j equal 1, L equal { }, and let H be the current mixed gaussian model;
(3.2.3)E(j)={E1,E2.,..,EM}={N(ND(j)|ui,Σi) I | (1, 2., M } }, i.e., ND for each newly arrived fragment data ND(j)Calculating the value of each Gaussian component;
(3.2.4) pairs of E(j)Carrying out normalization processing;
(3.2.5)I=argmax(E(j)),V=max(E(j));
(3.2.6) if V>0.5, then L ═ U { ND-(j)Else, executing step (2.2.8);
(3.2.7) if | L | > is equal to N, performing mixed gaussian clustering on all data in L by adopting an EM algorithm to obtain a new model HL, making H equal to H ═ HL, and making L equal to { };
(3.2.8) mixing ND(j)Classifying the I-th Gaussian component in the H, and recalculating the average value of the I-th Gaussian component;
(3.2.9) j is j +1, if j > R, the algorithm ends, otherwise go back to step (2.2.3);
(4) early warning judgment; and if the length of the set L is always smaller than N after the T batches of new data arrive, starting early warning judgment and early warning the small-probability time sequence segments.
2. The medicine cloud platform big data abnormity online early warning method based on the statistic generation model as claimed in claim 1, wherein in the step (2.1), affine transformation is performed on each feature vector to enable each feature vector to be mapped to a p-dimensional finite space, and the maximum length of each dimension is set as LiI belongs to {1,2,. eta., p }, and the value range of each dimension is [0, L ]i](ii) a Feature vector of affine transformation at time t is recorded as D'tThen the affine transformation is defined by the following formula:
wherein d'ti(i ═ 1, 2.. multidot.p) is D'tThe ith dimension component of (1).
3. The medicine cloud platform big data abnormity online early warning method based on statistical generation model as claimed in claim 1, wherein in the step (2.3.4.2), Angle is weightedriThe calculation formula of (2) is as follows:
in the above formula, x represents a dot product operation of vectors, and d represents an euclidean distance between two vectors.
5. the medicine cloud platform big data abnormity online early warning method based on the statistic generation model as claimed in claim 1, wherein in the step (3), the early warning determination method is to substitute each new time sequence fragment data into the Gaussian mixture model, and if the calculated value is less than 0.1, it indicates that a small probability time sequence fragment occurs, the time sequence fragment is early warned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911379506.7A CN111143435B (en) | 2019-12-27 | 2019-12-27 | Medicine cloud platform big data abnormity online early warning method based on statistical generation model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911379506.7A CN111143435B (en) | 2019-12-27 | 2019-12-27 | Medicine cloud platform big data abnormity online early warning method based on statistical generation model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111143435A CN111143435A (en) | 2020-05-12 |
CN111143435B true CN111143435B (en) | 2021-04-13 |
Family
ID=70521108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911379506.7A Active CN111143435B (en) | 2019-12-27 | 2019-12-27 | Medicine cloud platform big data abnormity online early warning method based on statistical generation model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111143435B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102073883A (en) * | 2009-11-19 | 2011-05-25 | 夏普株式会社 | Method and equipment for detecting subsequence in time sequence data |
EP3244339A1 (en) * | 2016-05-10 | 2017-11-15 | Aircloak GmbH | Systems and methods for anonymized statistical database queries using noise elements |
CN108198596A (en) * | 2018-03-23 | 2018-06-22 | 顾泰来 | A kind of medical institutions' Drug use administration device, terminal and method |
CN109858748A (en) * | 2018-12-26 | 2019-06-07 | 航天信息股份有限公司 | It eats medicine and supervises intelligent terminal acquisition system |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105205112A (en) * | 2015-09-01 | 2015-12-30 | 西安交通大学 | System and method for excavating abnormal features of time series data |
CN106992900A (en) * | 2016-01-20 | 2017-07-28 | 北京国双科技有限公司 | The method and intelligent early-warning notification platform of monitoring and early warning |
WO2017146930A1 (en) * | 2016-02-22 | 2017-08-31 | Rapiscan Systems, Inc. | Systems and methods for detecting threats and contraband in cargo |
US10324961B2 (en) * | 2017-01-17 | 2019-06-18 | International Business Machines Corporation | Automatic feature extraction from a relational database |
CN107180076B (en) * | 2017-04-18 | 2018-08-24 | 中国检验检疫科学研究院 | Pesticide residue visual method based on high resolution mass spectrum+internet+geography information |
US11741114B2 (en) * | 2017-12-19 | 2023-08-29 | ExxonMobil Technology and Engineering Company | Data analysis platform |
-
2019
- 2019-12-27 CN CN201911379506.7A patent/CN111143435B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102073883A (en) * | 2009-11-19 | 2011-05-25 | 夏普株式会社 | Method and equipment for detecting subsequence in time sequence data |
EP3244339A1 (en) * | 2016-05-10 | 2017-11-15 | Aircloak GmbH | Systems and methods for anonymized statistical database queries using noise elements |
CN108198596A (en) * | 2018-03-23 | 2018-06-22 | 顾泰来 | A kind of medical institutions' Drug use administration device, terminal and method |
CN109858748A (en) * | 2018-12-26 | 2019-06-07 | 航天信息股份有限公司 | It eats medicine and supervises intelligent terminal acquisition system |
Non-Patent Citations (1)
Title |
---|
基于J2EE的医药销售管理系统的设计与实现;陈福元;《中国优秀硕士学位论文全文数据库信息科技辑》;20180715;第1-50页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111143435A (en) | 2020-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nguyen et al. | $\mathtt {Deepr} $: a convolutional net for medical records | |
CN106846361B (en) | Target tracking method and device based on intuitive fuzzy random forest | |
Zhang et al. | Clustering-based missing value imputation for data preprocessing | |
Sadiq et al. | Mining anomalies in medicare big data using patient rule induction method | |
CN110929752B (en) | Grouping method based on knowledge driving and data driving and related equipment | |
Thakran et al. | Unsupervised outlier detection in streaming data using weighted clustering | |
CN112992370B (en) | Unsupervised electronic medical record-based medical behavior compliance assessment method | |
CN113255841B (en) | Clustering method, clustering device and computer readable storage medium | |
Xie et al. | Retinal vascular image segmentation using genetic algorithm Plus FCM clustering | |
Winter et al. | Fast indexing strategies for robust image hashes | |
US8977061B2 (en) | Merging face clusters | |
WO2023029347A1 (en) | Multi-source data-based disease early warning method and apparatus, device, and storage medium | |
CN114218009A (en) | Time series abnormal value detection method, device, equipment and storage medium | |
WO2023082641A1 (en) | Electronic archive generation method and apparatus, and terminal device and storage medium | |
Jaiswal et al. | Deep learned cumulative attribute regression | |
CN110083724B (en) | Similar image retrieval method, device and system | |
CN111143435B (en) | Medicine cloud platform big data abnormity online early warning method based on statistical generation model | |
Külah et al. | COVID-19 forecasting using shifted Gaussian Mixture Model with similarity-based estimation | |
US11031044B1 (en) | Method, system and computer program product for self-learned and probabilistic-based prediction of inter-camera object movement | |
Settipalli et al. | Predictive and adaptive drift analysis on decomposed healthcare claims using ART based topological clustering | |
US20230068453A1 (en) | Methods and systems for determining and displaying dynamic patient readmission risk and intervention recommendation | |
Lin et al. | Proximity-aware hierarchical clustering of unconstrained faces | |
CN111507424B (en) | Data processing method and device | |
CN109872183A (en) | Intelligent Service evaluation method, computer readable storage medium and terminal device | |
Abubakar et al. | A convolutional neural network with K-neareast neighbor for image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |