CN113724885B

CN113724885B - Abnormal sign identification method, device, equipment and storage medium

Info

Publication number: CN113724885B
Application number: CN202111133499.XA
Authority: CN
Inventors: 皇甫潇潇; 施建安; 庄一波; 程凌芳; 徐艺; 林家彬
Original assignee: Xiamen Yilianzhong Yihui Technology Co ltd
Current assignee: Xiamen Yilianzhong Yihui Technology Co ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2023-12-05
Anticipated expiration: 2041-09-27
Also published as: CN113724885A

Abstract

The embodiment of the application provides a method, a device, equipment and a storage medium for identifying abnormal signs, and relates to the technical field of big data analysis. The identification method comprises steps S1 to S6. S1, acquiring physical sign data. S2, updating the clustering quantity of the sign data by a dichotomy, and acquiring the grouping quantity of the sign data based on the clustering quantity and based on the error square sum. S3, grouping the sign data according to the grouping number to obtain a plurality of data groups. S4, clustering is carried out on each data set respectively to obtain abnormal data. S5, acquiring identity information corresponding to the abnormal data to obtain initial abnormal crowd. S6, acquiring time sequence data of sign data of the initial abnormal crowd so as to identify the final abnormal crowd with abnormal signs. The abnormal sign crowd is identified from the crowd through a clustering method, and the clustering speed is greatly improved through the combination of a dichotomy and error square sum.

Description

Abnormal sign identification method, device, equipment and storage medium

Technical Field

The application relates to the technical field of big data analysis, in particular to a method, a device, equipment and a storage medium for identifying abnormal signs.

Background

The user can keep a large amount of health sign data in the daily activity process, and more health data are networked along with the development of society. For example, the construction of citizen health platforms allows all health institutions in the whole city to share data by networking, including hospitals, disease control, community health homes. The data can be shared by the platform for the Wei Jian commission from the birth pregnancy record to the death record.

With the increasing number of health data, how to use the data to create new value for society gradually increases the schedule. In particular, how to use the health data to find out abnormal people, form abnormal sign situation awareness, and realize abnormal risk tracking and early warning becomes important.

In view of the above, the applicant has studied the prior art and has made the present application.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for identifying abnormal signs, which are used for solving the problem that the personnel with the abnormal signs cannot be accurately found out from a large amount of health data in the related technology.

A first aspect,

The embodiment of the application provides a method for identifying abnormal signs, which comprises the steps S1 to S6.

S1, acquiring physical sign data;

s2, updating the clustering quantity of the sign data by a dichotomy, and acquiring the grouping quantity of the sign data based on the clustering quantity and based on the square sum of errors;

s3, grouping the physical sign data according to the grouping number to obtain a plurality of data groups;

s4, clustering is carried out on each data set respectively to obtain abnormal data;

s5, acquiring identity information corresponding to the abnormal data to obtain initial abnormal crowd;

s6, acquiring time sequence data of the sign data of the initial abnormal crowd so as to identify the final abnormal crowd with abnormal signs.

In an alternative embodiment, step S2 specifically includes:

generating a plurality of initial cluster numbers based on a dichotomy;

respectively calculating error square sums of the plurality of initial cluster numbers to obtain a section where the inflection point cluster numbers are located;

selecting an intermediate cluster number from the interval based on a dichotomy, calculating the error square sum of the intermediate cluster number, and updating the interval according to the error square sum of the intermediate cluster number until the interval cannot continue dichotomy so as to obtain an inflection point cluster number;

and acquiring the grouping number according to the inflection point clustering number.

In an alternative embodiment, step S3 specifically includes:

grouping the physical sign data according to the grouping number, and performing discrete marking processing on each grouping to obtain a plurality of data groups; wherein the discrete marking process is a coloring discrete process.

In an alternative embodiment, step S4 specifically includes:

selecting a corresponding number of initial centroids from each data set according to the grouping number;

clustering based on the distance between each data point in the data set and the initial centroid so as to obtain an initial cluster;

calculating to obtain a calculated mass center according to the initial cluster;

clustering based on the distance from each data point in the data set to the calculated centroid so as to obtain a calculated cluster; calculating and updating the calculated mass center according to the calculated cluster until the position of the calculated mass center is not changed any more or the distance of the change is smaller than a preset value, completing the clustering processing of each data set, and obtaining the cluster of each data set;

acquiring normal data ranges of the data sets according to the clustering clusters of the data sets;

and extracting the abnormal data from each data group according to the normal data range of each data group.

In an alternative embodiment, step 6 specifically includes:

the initial abnormal crowd is brought into an abnormal sandbox, and time sequence data of physical sign data of the initial abnormal crowd are obtained;

according to the time sequence data, analyzing through a continuous observation model to identify the final abnormal crowd of the abnormal sign; wherein, the continuous observation model is:

g is the ratio of whether the observed variable value accords with theoretical expectation, i is the observation time, O is the observed value, and E is the expected value; e is the mean of the data set.

A second aspect,

The embodiment of the application provides an abnormal sign identification device, which comprises:

the sign data acquisition module is used for acquiring sign data;

the grouping number acquisition module is used for updating the clustering number of the sign data through a dichotomy, and acquiring the grouping number of the sign data based on the clustering number and based on the square sum of errors;

the grouping module is used for grouping the sign data according to the grouping number so as to obtain a plurality of data groups;

the abnormal data acquisition module is used for carrying out clustering processing on each data set respectively so as to acquire abnormal data;

the initial crowd acquisition module is used for acquiring identity information corresponding to the abnormal data so as to acquire initial abnormal crowd;

the final crowd acquisition module is used for acquiring time sequence data of the sign data of the initial abnormal crowd so as to identify the final abnormal crowd with abnormal signs.

In an alternative embodiment, the packet number acquisition module includes:

an initial unit for generating a plurality of initial cluster numbers based on a dichotomy;

the interval unit is used for respectively calculating the error square sums of the initial cluster numbers so as to obtain an interval in which the inflection point cluster numbers are located;

the inflection point unit is used for selecting one middle cluster number from the interval based on a dichotomy, calculating the error square sum of the middle cluster number, and updating the interval according to the error square sum of the middle cluster number until the interval cannot continue to be halved so as to obtain the inflection point cluster number;

a grouping number unit for obtaining the grouping number according to the inflection point cluster number

In an alternative embodiment, the grouping module is specifically configured to:

A third aspect,

The embodiment of the application provides an abnormal sign identification device, which comprises a processor, a memory and a computer program stored in the memory; the computer program is executable by the processor to implement a method of identifying abnormal signs as described in the first aspect.

A fourth aspect,

An embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device where the computer readable storage medium is located to execute a method for identifying abnormal signs as described in the first aspect.

By adopting the technical scheme, the application can obtain the following technical effects:

the method identifies the abnormal sign crowd from the crowd by a clustering method, discovers the abnormal crowd early, and prevents or treats the abnormal crowd in time, thereby having good practical significance. In addition, the clustering method can greatly improve the clustering speed by combining a dichotomy and error square sum.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for identifying abnormal signs according to a first embodiment of the present application.

Fig. 2 is a logic block diagram of a method for identifying abnormal signs according to a first embodiment of the present application.

Fig. 3 is a discrete distribution diagram of raw sample data.

FIG. 4 distribution diagram of heart rate detection data of actual sample population before clustering

Fig. 5 clustered human heart rate detection data situation distribution diagram

Fig. 6 is a schematic structural diagram of an abnormal sign recognition device according to a second embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

For a better understanding of the technical solution of the present application, the following detailed description of the embodiments of the present application refers to the accompanying drawings.

It should be understood that the described embodiments are merely some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

References to "first\second" in the embodiments are merely to distinguish similar objects and do not represent a particular ordering for the objects, it being understood that "first\second" may interchange a particular order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate, such that the embodiments described herein may be implemented in sequences other than those illustrated or described herein.

The application is described in further detail below with reference to the attached drawings and detailed description:

embodiment one:

referring to fig. 1 to 5, a first embodiment of the present application provides a method for identifying abnormal signs, which can be performed by an abnormal sign identifying apparatus. In particular, the steps S1 to S6 are implemented by one or more processors in the abnormal sign recognition device.

S1, acquiring physical sign data.

It is understood that a user may retain a vast amount of health sign data during daily activities. The vital sign data are basic data of a bottom layer, and are mainly vital sign data such as health sign data, example blood pressure data, blood sugar data, heart rate data, body temperature data and the like of a patient. The source of the basic sign data can be a source patient provided sign monitoring record, an intra-hospital hospitalization measurement sign record, a periodic physical examination detection record, and the like.

The abnormal sign recognition device can be an electronic device with computing performance, such as a laptop computer, a desktop computer, a server, a smart phone or a tablet computer.

S2, updating the clustering quantity of the sign data by a dichotomy, and acquiring the grouping quantity of the sign data based on the clustering quantity and based on the error square sum.

Specifically, the sum of squares of errors can be used to determine the degree of dispersion between the population and the cluster center. The cluster effect of different cluster numbers can be compared by taking the sum of squares of errors as an index. When the number of clusters is smaller than the real number, the sum of squares of errors drops sharply as the number of clusters rises. When the number of clusters is larger than or equal to the real number and smaller than or equal to the characteristic number, the sum of squares of errors slowly decreases until the number of clusters is equal to 0 when the number of clusters is equal to the real number. The clustering quantity is updated by a dichotomy, inflection points of error square sum rapid descent and slow descent, namely the actual clustering quantity, can be rapidly found, the algorithm speed is greatly improved, and the method has good practical significance.

In the above embodiment, in an alternative embodiment of the present application, the step S2 specifically includes steps S21 to S24.

S21, generating a plurality of initial cluster numbers based on a dichotomy. Specifically, the initial cluster number is set according to an index of 2 until a predetermined value is reached. For example, the predetermined number is 8, the initial cluster number is 2,4,8. Specifically, the initial cluster number is generated by a dichotomy, so that each cluster number value can be prevented from being calculated, and the calculation speed is primarily improved. And, the number of true clusters is typically not too high. Therefore, the initial cluster number is set based on the index of 2, and when the numerical value is smaller, the interval between the two numerical values is smaller, so that the cluster number where the inflection point is located can be found more quickly.

S22, calculating error square sums of a plurality of initial clustering numbers respectively to obtain a section where the inflection point clustering numbers are located. Specifically, since the interval between the initial cluster numbers is relatively large. Therefore, only the interval where the cluster number corresponding to the inflection point (i.e., the true cluster number) is located can be found by the initial cluster number.

S23, selecting a middle cluster number in the interval based on a dichotomy, calculating the error square sum of the middle cluster number, and updating the interval according to the error square sum of the middle cluster number until the interval cannot continue to be halved, so as to obtain the inflection point cluster number.

It can be understood that after finding the interval where the real cluster number is located, the interval is continuously halved until the inflection point is found. The calculation time is greatly saved.

S24, acquiring the grouping number according to the inflection point clustering number.

It will be appreciated that 1 person mentions this data as 1 data set. The embodiment of the application uses the dichotomy (2, 4,8 and … …) to set the initial cluster number by taking all data sets as one cluster. The SSE (sum of squares error) of this cluster corresponding to each initial cluster number is calculated. And finding out the interval where the inflection point is located (for example, 2 < K < 8) through the square sum of errors corresponding to the initial clusters. Then, the interval (for example, 2 < K < 8) is continuously subjected to the bipartite updating until the interval can not be bipartited any more, so that an inflection point, namely, the number of packets is found.

In the step, half-segmentation heuristic calculation is firstly carried out on the original data, so that the global data can be determined to be divided into a plurality of groups, and the optimal data base line range can be obtained through data analysis. Specifically, data segmentation is performed on all mass collected basic data, clustering processing is performed on the segmented data, and segmentation calculation is performed before clustering to determine the optimal grouping number, so that global optimization can be guaranteed. Rather than directly subjectively determining the number of packets.

S3, grouping the sign data according to the grouping number to obtain a plurality of data groups. Specifically, the processing of data first requires the grouping of data. The grouping mode is defined in a distinguishing way according to the research targets. For example, heart rate conditions of people with different ages need to be studied, and at this time, sign data needs to be grouped according to ages of people corresponding to the signs. It will be appreciated that samples within the same group after grouping have the same identity, i.e. the normal range of values is the same, while the normal range of data is different between different groups. The original data (namely all the sign data) are grouped according to the grouping number, the individuals of the same type can be summarized together, and people different from most people in abnormal signs can be obviously found out, so that the identification result is more accurate.

In the above embodiment, in an alternative embodiment of the present application, step S3 specifically includes step S31.

S31, grouping the sign data according to the number of the groups, and performing discrete marking processing on each group to obtain a plurality of data groups. Wherein the discrete marking process is a coloring discrete process.

Specifically, the discrete processing of the data division mark according to the division data: determining the dividing number, dividing the basic data into N groups, and performing discrete marking processing on each group to base the subsequent clustering. In this embodiment, the discrete mark process is a coloring process, and in other embodiments, different clusters may be used to represent different groups, and the specific form of the discrete mark is not limited by the present application.

S4, clustering is carried out on each data set respectively to obtain abnormal data. After the number of packets is obtained and the original data is grouped according to the number of packets and each packet is discretely marked. Clustering of the individual packets is required. After clustering the packets, we find that the similar data are more and more tightly aggregated, so that the range of normal data can be defined according to the tight data aggregation range, and the data which is not in the data ring, namely the outgoing data, namely the initial abnormal data, is stripped.

In the above embodiment, in an alternative embodiment of the present application, step S4 specifically includes steps S41 to S46.

S41, selecting initial centroids with corresponding quantity in each data group according to the grouping number.

S42, clustering is carried out based on the distance between each data point in the data set and the initial centroid so as to obtain an initial cluster.

S43, calculating to obtain a calculated mass center according to the initial cluster.

S44, clustering is carried out based on the distance from each data point in the data set to the calculated mass center so as to obtain a calculated cluster. And calculating and updating the calculated mass center according to the calculated cluster until the position of the calculated mass center is not changed or the distance of the change is smaller than a preset value, completing the clustering processing of each data set, and obtaining the cluster of each data set.

Specifically, in the present embodiment, the number of centroids is set in the number of groups, so that the groups are clustered. In other embodiments, the same method for obtaining the number of groups as in step S2 may be used to determine the number of centroids in each group, and then cluster each group according to the number of centroids, so as to obtain a cluster of each data group.

S45, acquiring the normal data range of each data group according to the cluster of each data group.

S46, extracting abnormal data from each data group according to the normal data range of each data group.

Specifically, after the clustering is completed, the data of each group needs to be stored. The main content stored is the data normal range datum line, the data abnormal point and the data associated patient information.

S5, acquiring identity information corresponding to the abnormal data to obtain initial abnormal crowd.

S6, acquiring time sequence data of sign data of the initial abnormal crowd so as to identify the final abnormal crowd with abnormal signs.

Specifically, for the extracted abnormal data, we cannot determine that it is abnormal in sign 100%. The determination of the vital sign data itself may have uncertainties influenced by objective conditions for the characteristics of the analysis data. The abnormal physical sign of the user is not known to be an abnormal crowd, and the abnormal physical sign is possibly caused by heart rate abnormality and other environmental influences caused by intense exercise just performed when the user measures at the moment.

Therefore, the stripped data abnormal point is used as the key observation data to be added into the data sandbox for continuous data observation, the accuracy of the abnormal data can be ensured,

in the above embodiment, in an optional embodiment of the present application, step 6 specifically includes step S61 to step S62:

s61, the initial abnormal crowd is brought into the abnormal sandbox, and time sequence data of physical sign data of the initial abnormal crowd are obtained.

S62, analyzing through a continuous observation model according to the time sequence data to identify the final abnormal crowd with abnormal signs. Wherein, the continuous observation model is:

g is the ratio of whether the observed variable value accords with theoretical expectation, i is the observation time, O is the observed value, and E is the expected value. E is the mean of the data set.

Specifically, in order to perform more accurate confirmation capture, the screened abnormal data is included in an abnormal sandbox to continuously observe the signs of users of the abnormal data crowd, an observation model is established for further measurement and calculation, and whether the initial abnormal crowd is a final abnormal target is determined.

In addition, situation awareness, abnormal data condition and distribution awareness, abnormal sign data positioning and early warning and key tracking of abnormal sign people can be performed according to the identification result of the abnormal sign identification method. And can be based on the results of the data processing. And obtaining a situation awareness basic trend graph. And the distribution of the abnormal points is also remarkable. The method can be used for definitely and quickly positioning the abnormal data crowd and carrying out data early warning and application. Meanwhile, the corresponding crowd situation awareness application can be set according to the crowd corresponding to the specific abnormal physical sign and by combining specific business analysis. (e.g., after marking an abnormal crowd, view regional distribution of abnormal patients, etc.)

In the following, we will describe the identification method in a specific case so that the person skilled in the art can more clearly understand the overall scheme.

Step 1, physical sign data are obtained (see table 1). Wherein, the discrete distribution diagram of the original sign data (before segmentation) is shown in fig. 3:

TABLE 1 original sign data sheet

And step 2, dividing the basic data into the whole data, wherein half-divided data heuristics are adopted for dividing. Specific:

the original data is firstly subjected to half-segmentation calculation, so that the global data can be determined to be divided into a plurality of groups, and an optimal solution can be obtained. The segmentation and measurement modes are as follows:

1. all points are first treated as one cluster.

2. The first step is then to divide the cluster in two, i.e. k=2.

3. The loop uses k=1, 2,4, 8..this way of the fractional sequence, a value for k that gets better clustering at v/2, v is found by calculating the sum of squares of the errors.

4. The half-segmentation dichotomy is then continued to find the best k value between v/2, v.

5. And finding out the k value corresponding to the inflection point which slows down the square error and SSE in the whole process, namely the optimal k value.

Where k is the number of clusters, p is the sample, and mi is the center point of the ith cluster.

6. The k value is tried by adopting a halving and halving mode, and the change from O (n) to O (log) is greatly reduced from the time complexity ₂ ⁿ )。

Based on the above basic sample data, it was found by calculation that when k=3, i.e., divided into 3 groups, the SSE drop tends to be gentle, and hence the corresponding k value at this time is the optimal group.

Based on the above results, when k=3, the original sample data is subjected to the discrete part again, the re-coloring discrete labeling process is performed according to the age division, the sample data is divided into k=3 groups according to the age division, the grouping labeling coloring discrete processes are performed respectively for infants (0 to 6 years), children (7 to 15 years), and adults (18 to 25 years). Wherein the discrete chart after the coloring discrete mark treatment is shown in fig. 4. The discrete diagrams corresponding to fig. 4 are marked with different colors.

Table 2, crowd heart rate detection data table, unit after the raw data grouping: times/min

/>

And 4, clustering the packets to obtain abnormal data. The specific analysis steps are as follows:

1. 3 data points are selected from the 3 sets of data (identified by particle 1 and particle 2, centroid 3) as the initial particles (differentiated by color in the figure for different age groups).

2. Each data of the same color in the set calculates the distance to particle 1, particle 2, centroid 3, and clusters near which particle.

3. At this point each particle gathers a batch of discrete data, and then repeats the second step, re-selecting a new particle.

4. The clustering process can be terminated when the distance between the new centroid and the old centroid is not changed greatly, namely the cluster tends to be stable and converged.

Finally, as shown in fig. 5, the clustering result is that in the whole process, no supervision algorithm is needed, the data are just divided into a plurality of groups at the beginning, then the later operation is completed by the clustering flow, and the data are classified by using the characteristics of the non-supervision machine learning.

5. The clustering result shows that:

the normal baseline of the heart rate of the infants in the sample data is 100-150 times/min.

The abnormal heart rate of the infant sample population was peeled off 180 and 185 beats/min.

The normal baseline for heart rate of children in the sample data is 80-110 (beats/min).

The abnormal heart rate of the population from which the child sample was peeled was 150 (beats/min).

The normal baseline for heart rate of adults in the sample data is 60-100 (beats/min).

The abnormal heart rate of the population with the adult sample stripped out was 130 (beats/min).

And 5, establishing a continuous observation model for crowd inclusion abnormal sandboxes corresponding to the abnormal target data to analyze and confirm.

And adding the stripped data abnormal points serving as key observation data into a data sandbox for continuous data observation. The data model observation is added to ensure the accuracy of the abnormal data, because the measurement of the human body sign data itself has uncertainty affected by objective conditions with respect to the characteristics of the analysis data. The abnormal physical sign of the user is not known to be an abnormal crowd, and the abnormal physical sign is possibly caused by heart rate abnormality and other environmental influences caused by intense exercise just performed when the user measures at the moment.

Therefore, in order to perform more accurate confirmation capture, we incorporate the abnormal data screened out by the steps into an abnormal sandbox for continuous observation, build an observation model, and observe whether the variable value accords with the theoretical expected ratio. The observation model is as follows:

wherein O is the observed value, E is the expected value, and if the data volume and data frequency collected by us are stable, we assume that we observe continuously for one month (30 days)

And (3) calculating an average value E1, E2, & E30 according to the heart rate of each classified age group, and after the model is produced, calculating a G value by applying the above formula to heart rate distribution values O1, O2 and … … O30 of the target patient corresponding to the abnormal data.

And then observing the maximum confidence coefficient of the target abnormal data according to the basic value of the model, and judging whether the confidence coefficient meets the minimum requirement of the user, so as to judge who is the patient population with abnormal heart rate.

A second aspect,

the sign data acquisition module 1 is used for acquiring sign data.

The grouping number obtaining module 2 is configured to update the clustering number of the sign data by a dichotomy, and obtain the grouping number of the sign data based on the clustering number and based on the square sum of errors.

A grouping module 3, configured to group the sign data according to the number of groups, so as to obtain a plurality of data groups.

The abnormal data acquisition module 4 is used for respectively carrying out clustering processing on each data group so as to obtain abnormal data.

The initial crowd obtaining module 5 is configured to obtain identity information corresponding to the abnormal data, so as to obtain an initial abnormal crowd.

The final crowd acquisition module 6 is used for acquiring time sequence data of the sign data of the initial abnormal crowd so as to identify the final abnormal crowd with abnormal signs.

In an alternative embodiment, the packet number acquisition module 2 includes:

and the initial unit is used for generating a plurality of initial cluster numbers based on the dichotomy.

And the interval unit is used for respectively calculating the error square sums of the plurality of initial cluster numbers so as to obtain an interval in which the inflection point cluster number is located.

And the inflection point unit is used for selecting one middle cluster number from the interval based on a dichotomy, calculating the error square sum of the middle cluster numbers, and updating the interval according to the error square sum of the middle cluster numbers until the interval cannot continue to be halved so as to obtain the inflection point cluster number.

In an alternative embodiment, the grouping module 3 is specifically configured to:

and grouping the sign data according to the number of the groups, and performing discrete marking processing on each group to obtain a plurality of data groups. Wherein the discrete marking process is a coloring discrete process.

In an alternative embodiment, the abnormal data acquisition module 4 is specifically configured to:

based on the number of groupings, a corresponding number of initial centroids is selected in each data set.

Clustering is performed based on the distance of each data point in the data set to the initial centroid to obtain an initial cluster.

And calculating to obtain a calculated centroid according to the initial cluster.

Clustering is performed based on distances from each data point in the data set to the calculated centroid to obtain a calculated cluster. And calculating and updating the calculated mass center according to the calculated cluster until the position of the calculated mass center is not changed or the distance of the change is smaller than a preset value, completing the clustering processing of each data set, and obtaining the cluster of each data set.

And acquiring the normal data range of each data group according to the cluster of each data group.

Abnormal data is extracted from each data group according to the normal data range of each data group.

In an alternative embodiment, the end population acquisition module 6 is specifically configured to:

and incorporating the initial abnormal crowd into an abnormal sandbox, and acquiring time sequence data of sign data of the initial abnormal crowd.

And according to the time sequence data, analyzing through a continuous observation model to identify the final abnormal crowd with abnormal signs. Wherein, the continuous observation model is:

A third aspect,

The embodiment of the application provides an abnormal sign identification device, which comprises a processor, a memory and a computer program stored in the memory. The computer program is executable by the processor to implement a method of identifying abnormal signs as described in the first aspect.

A fourth aspect,

An embodiment of the present application provides a computer readable storage medium, the computer readable storage medium including a stored computer program, wherein the computer program when run controls a device in which the computer readable storage medium is located to execute a method for identifying an abnormal sign as described in the first aspect.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for identifying an abnormal sign, comprising:

acquiring physical sign data;

updating the clustering quantity of the sign data by a dichotomy, and acquiring the grouping quantity of the sign data based on the clustering quantity and based on the error square sum;

grouping the sign data according to the grouping number to obtain a plurality of data groups;

clustering is carried out on each data group respectively so as to obtain abnormal data;

acquiring identity information corresponding to the abnormal data to obtain initial abnormal crowd;

acquiring time sequence data of the sign data of the initial abnormal crowd to identify the final abnormal crowd with abnormal signs;

acquiring the grouping number of the sign data through the error square sum; the clustering quantity of the error square sum is updated by a dichotomy, and the method specifically comprises the following steps:

generating a plurality of initial cluster numbers based on a dichotomy;

acquiring the grouping number according to the inflection point clustering number;

acquiring time sequence data of the sign data of the initial abnormal crowd to identify the final abnormal crowd with abnormal signs, wherein the method specifically comprises the following steps:

wherein G is the ratio of whether the observed variable value accords with theoretical expectation, i is the observation time, O is the observed value, and E is the expected value.

2. The method for identifying abnormal signs according to claim 1, wherein the physical sign data is grouped according to the number of groups to obtain a plurality of data groups, specifically comprising:

3. The method for identifying abnormal signs according to claim 1, wherein clustering is performed on each of the data sets to obtain abnormal data, and specifically comprises:

4. An apparatus for identifying an abnormal sign, comprising:

the sign data acquisition module is used for acquiring sign data;

the final crowd acquisition module is used for acquiring time sequence data of the sign data of the initial abnormal crowd so as to identify the final abnormal crowd with abnormal signs;

the packet number acquisition module includes:

a grouping number unit, configured to obtain the grouping number according to the inflection point cluster number;

the final crowd acquisition module is specifically used for:

bringing the initial abnormal crowd into an abnormal sandbox, and acquiring time sequence data of sign data of the initial abnormal crowd;

5. The device for identifying abnormal signs according to claim 4, wherein the grouping module is specifically configured to:

6. An apparatus for identifying an abnormal sign, comprising a processor, a memory, and a computer program stored in the memory; the computer program being executable by the processor to implement a method of identifying abnormal signs as claimed in any one of claims 1 to 3.

7. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the method of identifying abnormal signs as claimed in any one of claims 1 to 3.