WO2020155756A1 - Procédé et dispositif pour optimiser une proportion de points anormaux sur la base d'un regroupement et d'une sse - Google Patents

Procédé et dispositif pour optimiser une proportion de points anormaux sur la base d'un regroupement et d'une sse Download PDF

Info

Publication number
WO2020155756A1
WO2020155756A1 PCT/CN2019/117363 CN2019117363W WO2020155756A1 WO 2020155756 A1 WO2020155756 A1 WO 2020155756A1 CN 2019117363 W CN2019117363 W CN 2019117363W WO 2020155756 A1 WO2020155756 A1 WO 2020155756A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
abnormal
point
current
clustering
Prior art date
Application number
PCT/CN2019/117363
Other languages
English (en)
Chinese (zh)
Inventor
杨志鸿
徐亮
阮晓雯
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020155756A1 publication Critical patent/WO2020155756A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • This application relates to the technical field of intelligent decision-making, and in particular to a method and device for optimizing the proportion of abnormal points based on clustering and SSE.
  • Outlier analysis is the process of checking whether the data has input errors and contains unreasonable data. It is very dangerous to ignore the existence of outliers. Including the outliers in the calculation and analysis process of the data without eliminating them will cause bad results. influences.
  • the embodiments of the present application provide a method, device, computer equipment and storage medium for optimizing the proportion of abnormal points based on clustering and SSE, aiming to solve the problem that there are often multiple normal point centers in massive user data in the prior art. Dividing a large amount of user data before performing outlier detection will result in poor discrimination of the unsupervised model used for outlier detection, and the problem of inability to finely detect outlier data.
  • an embodiment of the present application provides a method for optimizing the proportion of abnormal points based on clustering and SSE, which includes:
  • the residual variation range is obtained
  • the current abnormal point ratio plus the step length is used as the optimal abnormal point ratio
  • the selected clusters are classified according to the single-class support vector machine and the optimal proportion of abnormal points to obtain the optimal classification result.
  • an embodiment of the present application provides a device for optimizing the proportion of abnormal points based on clustering and SSE, which includes:
  • the clustering unit is configured to receive a set of data points to be classified, and cluster the set of data points to be classified through k-means clustering to obtain multiple clusters;
  • the multi-model construction unit is used to obtain the data points corresponding to each cluster included in the multiple clusters, and construct one-to-one with each cluster according to the preset current abnormal point ratio and each cluster Corresponding single-class support vector machine for outlier detection;
  • the normal point center obtaining unit is used to classify the selected cluster according to the single-class support vector machine and the current abnormal point ratio to obtain the normal point center of the normal category in the classification result;
  • the first residual calculation unit is configured to obtain the residual sum of squares of each data point of the abnormal category in the classification result and the center of the normal point to obtain the current residual sum of squares;
  • the first ratio update unit is configured to subtract a preset step size from the current abnormal point ratio to update the current abnormal point ratio
  • the second residual calculation unit is used to classify the selected clusters according to the single-class support vector machine and the current abnormal point ratio to obtain data points of the current abnormal category, and obtain each data point of the current abnormal category
  • the residual error from the center of the normal point is taken as the next residual sum of squares and difference of squares
  • An amplitude calculation unit configured to divide the difference between the next residual sum of squares and the current residual sum of squares by the step size to obtain the residual variation range
  • An optimal ratio obtaining unit configured to, if the residual variation range exceeds the variation range threshold, use the current abnormal point ratio plus the step length as the optimal abnormal point ratio;
  • the optimal classification unit is used to classify the selected clusters according to the single classification support vector machine and the optimal abnormal point ratio to obtain the optimal classification result.
  • an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer
  • the program implements the clustering and SSE-based abnormal point ratio optimization method described in the first aspect above.
  • the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that, when executed by a processor, causes the processor to execute the aforementioned first
  • the optimization method based on clustering and SSE-based abnormal point ratio.
  • FIG. 1 is a schematic flowchart of a method for optimizing the proportion of abnormal points based on clustering and SSE provided by an embodiment of the application;
  • FIG. 2 is a schematic diagram of a sub-process of the method for optimizing the proportion of abnormal points based on clustering and SSE according to an embodiment of the application;
  • FIG. 3 is a schematic diagram of another sub-process of the method for optimizing the proportion of abnormal points based on clustering and SSE according to an embodiment of the application;
  • FIG. 4 is a schematic diagram of another sub-process of the method for optimizing the proportion of abnormal points based on clustering and SSE according to an embodiment of the application;
  • FIG. 5 is another flow diagram of the method for optimizing the proportion of abnormal points based on clustering and SSE according to an embodiment of the application;
  • FIG. 6 is a schematic block diagram of a device for optimizing the proportion of abnormal points based on clustering and SSE according to an embodiment of the application;
  • FIG. 7 is a schematic block diagram of subunits of the device for optimizing the proportion of abnormal points based on clustering and SSE according to an embodiment of the application;
  • FIG. 8 is a schematic block diagram of another subunit of the device for optimizing the proportion of abnormal points based on clustering and SSE according to an embodiment of the application;
  • FIG. 9 is a schematic block diagram of another subunit of the device for optimizing the proportion of abnormal points based on clustering and SSE according to an embodiment of the application;
  • FIG. 10 is another schematic block diagram of the device for optimizing the proportion of abnormal points based on clustering and SSE according to an embodiment of the application;
  • FIG. 11 is a schematic block diagram of a computer device provided by an embodiment of the application.
  • FIG. 1 is a schematic flowchart of an SSE-based abnormal point ratio optimization method provided by an embodiment of the application.
  • the SSE-based abnormal point ratio optimization method is applied to a server, and the method uses application software installed in the server. Carry out execution.
  • the method includes steps S101 to S181.
  • S101 Receive a set of data points to be classified, and cluster the set of data points to be classified through k-means clustering to obtain multiple clusters.
  • these business data can be regarded as a collection of data points to be classified.
  • the set of data points to be classified may be the user's insurance policy data, including at least fields such as the name of the applicant, the age of the applicant, the number of the applicant's insurance policy, the amount of insurance, the insurance period, and the phone number of the applicant.
  • one of the field data can be selectively selected as the main data, and the remaining fields are used as the attribute data of the above-mentioned main field.
  • the insurance period field is used as the main data, and fields such as the telephone number and ID number of the applicant are used as its attribute data.
  • step S101 includes:
  • S1012 Divide the set of data points to be classified according to the difference between each data point in the set of data points to be classified and each initial cluster center to obtain an initial clustering result
  • the k-means algorithm is used when clustering the set of data points to be classified, and the process is as follows:
  • the specific calculation method is to take the arithmetic mean of the primary attributes of all data points to be classified in each cluster, and choose the one closest to the arithmetic mean of the primary attributes
  • the data points to be classified are used as the new cluster centers, and the better cluster centers in the cluster data are reselected.
  • step d) Repeat step d) until the clustering result does not change, and the clustering result corresponding to the preset number of clusters is obtained.
  • the massive collection of data points to be classified can be grouped quickly to obtain multiple clusters.
  • the server receives the set of data points to be classified uploaded by the business end and completes the clustering and grouping
  • the initial current abnormal point ratio is set to 0.5 (for example, the initial current abnormal point ratio Denoted as m 0 )
  • m 0 the initial current abnormal point ratio
  • the abnormal point category contains a large number of misclassified normal points.
  • a single-class support vector machine for outlier detection is constructed according to the preset current proportion of abnormal points and the samples to be classified, as a model basis for subsequent adjustment of the current proportion of abnormal points and reclassification.
  • step S110 includes:
  • S111 Obtain the first parameter and the second parameter of the hyperplane corresponding to the single-class support vector machine corresponding to each cluster cluster according to the preset current abnormal point ratio and each cluster cluster;
  • S112 According to the first parameter and the second parameter of the hyperplane, and the current abnormal point ratio, construct a single-class support vector machine for abnormal point detection in a one-to-one correspondence with each cluster.
  • the single-class support vector machine is OneClassSVM, and its classification model is as follows:
  • ⁇ i represents the slack variable
  • v is an upper limit set in the score of outliers, or the lower bound of the number of examples in the training data set as support vectors
  • This method creates a hyperplane with parameters w and b, which has the largest distance from the zero point in the feature space, and separates the zero point from all data points.
  • each cluster is classified according to its corresponding single-class support vector machine.
  • S120 Classify the selected clusters according to the single-class support vector machine and the current abnormal point ratio, and obtain the normal point center of the normal category in the classification result.
  • the selected cluster when one of the multiple clusters is selected as the target cluster cluster to obtain the optimal anomalous point ratio as an example, the selected cluster should be selected according to the current anomaly point ratio set initially. After the clusters are classified by the single-class support vector machine, the normal point center corresponding to the data point of the normal category in the classification result can be determined, and this normal point center is constant in the subsequent process.
  • step S120 includes:
  • the selected clusters are first classified according to the single-class support vector machine and the current abnormal point ratio, and a classification result including data points of normal categories and data points of abnormal categories is obtained.
  • a classification result including data points of normal categories and data points of abnormal categories is obtained.
  • the center of the normal point is fixed, the proportion of abnormal points can be continuously adjusted, and the optimal abnormality can be obtained according to the change trend of the specified parameters (such as the average Euclidean distance between each data point of the current abnormal category and the center of the normal point) Point ratio.
  • the residual sum of squares is a measure of the degree of model fit in a linear model.
  • a continuous curve is used to approximate or compare discrete points on a plane to represent a data processing of the functional relationship between coordinates. method.
  • V 2 V 1 2 + V 1 2 + ... + V n 2
  • V i is the residual of measured data l i, l i, for example, the remaining amount of data
  • the difference can represent the residual of the data point l i of the abnormal category.
  • S140 Subtract a preset step length from the current abnormal point ratio to update the current abnormal point ratio.
  • the purpose of subtracting the preset step size from the current abnormal point ratio is to continuously adjust the current abnormal point ratio so as to obtain the optimal abnormal point ratio through the trial method.
  • S150 Classify the selected clusters according to the single-class support vector machine and the current abnormal point ratio to obtain data points of the current abnormal category, and obtain the center of each data point of the current abnormal category and the normal point.
  • the residual sum of squares is used as the next residual sum of squares.
  • the current abnormal point ratio is updated by subtracting the step size from the current abnormal point ratio. At this time, there is no need to determine the normal point center again, only the data points of the abnormal category in the classification result are obtained, and then the abnormality is calculated. The residual sum of squares of each data point of the category and the center of the normal point is used as the next residual sum of squares.
  • the current residual sum of squares obtained in step S130 is regarded as SSE 0
  • the next residual sum of squares obtained in the first execution of step S150 is regarded as SSE 1
  • the result obtained in the second execution of step S150 The next residual sum of squares is regarded as SSE 2 (the corresponding current residual sum of squares is SSE 1 at this time)
  • the next residual sum of squares obtained from the Nth execution of step S150 is regarded as SSE N (this time corresponding to The current residual sum of squares is SSE N-1 ).
  • the preset step length is denoted as l
  • the residual variation range is calculated by (SSE N -SSE N-1 )/l, where N is a positive integer greater than 0.
  • the latest current anomaly point ratio at this moment is not the optimal anomaly point ratio.
  • the current anomaly point ratio of the state before the latest current anomaly point ratio at this moment can be considered as the maximum. Proportion of excellent and abnormal points.
  • the residual variation range exceeds the preset variation range threshold, it means that some real abnormal points are classified as normal points, resulting in a sudden increase in the sum of squared residuals from the abnormal point to the normal center point.
  • the last state of the abnormal point ratio (that is, the current abnormal point ratio plus the step size) can be used as the optimal abnormal point ratio.
  • the method further includes:
  • step S190 If the residual variation range does not exceed the variation range threshold, subtract the step size from the current abnormal point ratio to update the current abnormal point ratio, and update the current residual square sum through the next residual square sum, Return to step S150.
  • the residual variation range still maintains a smooth transition, it means that the reduced proportion of abnormal points is not enough to significantly affect the sum of squared residuals between each data point of the abnormal category and the center of the normal point.
  • the current outlier ratio minus the step size to update the current outlier ratio, and the next residual sum of squares is used to update the current residual sum of squares.
  • Step S150 when (SSE N -SSE N-1 )/l does not exceed the preset variation threshold, first use SSE 1 as the current residual sum of squares, and (m 0 -l) as the current abnormal point ratio and return to execution again Step S150 is to obtain SSE 2 ; then when it flows to step S170 again, (SSE 2 -SSE 1 )/l is used as the residual variation range, and so on, until the residual variation range exceeds the preset variation range threshold. can.
  • the selected cluster can be classified according to the single-class support vector machine and the optimal anomaly point ratio to obtain the optimal classification result, and The unsupervised classification model with the best classification effect.
  • step S181 the method further includes:
  • the storage area corresponding to the optimal classification result and the optimal abnormal point ratio is formatted and deleted.
  • the optimal classification result corresponding to the set of data points to be classified and the optimal abnormal point ratio are obtained in the server, the optimal classification result and the The optimal abnormal point ratio is sent to the business end corresponding to the set of data points to be classified, so as to realize effective notification of the classification result of the business end.
  • the optimal classification result and the optimal abnormal point ratio can be sent to the cloud server in time at this time, and the corresponding data point set to be classified can be matched by the cloud server.
  • the set of data points to be classified corresponding to the optimal classification result and the optimal abnormal point ratio may also be synchronized to the cloud server.
  • the unique machine identification code such as IMEI serial number
  • the business end must be used as the data identification bit for unique data Logo.
  • the storage area corresponding to the optimal classification result and the optimal abnormal point ratio in the server can be formatted It can be deleted to effectively release storage space.
  • the method before formatting and deleting the storage area corresponding to the optimal classification result and the optimal abnormal point ratio, the method further includes:
  • the number of iterations is sent to the business end corresponding to the set of data points to be classified, and the number of iterations is synchronously sent to the cloud server.
  • the preset current anomaly point ratio and the optimal anomaly point ratio may be compared The difference in the ratio is divided by the step size to obtain the number of iterations. After the number of iterations is known, the number of iterations can be sent to the business end corresponding to the set of data points to be classified, and the business end can accumulate experience in setting the optimal abnormal point ratio accordingly.
  • This method realizes the accurate classification of massive data and the detection of abnormal points in each classification.
  • the proportion of abnormal points in the detection process is automatically adjusted and obtained without setting based on experience.
  • the embodiment of the present application also provides a device for optimizing the proportion of abnormal points based on clustering and SSE.
  • the device for optimizing the proportion of abnormal points based on clustering and SSE is used to perform any of the aforementioned methods for optimizing the proportion of abnormal points based on clustering and SSE Examples.
  • FIG. 6, is a schematic block diagram of an abnormal point ratio optimization device based on clustering and SSE provided in an embodiment of the present application.
  • the device 100 for optimizing the proportion of abnormal points based on clustering and SSE may be configured in a server.
  • the device 100 for optimizing the proportion of abnormal points based on clustering and SSE includes a clustering unit 101, a multi-model construction unit 110, a normal point center acquisition unit 120, a first residual calculation unit 130, and a first ratio update unit. 140.
  • the clustering unit 101 is configured to receive a set of data points to be classified, and cluster the set of data points to be classified through k-means clustering to obtain multiple clusters.
  • the clustering unit 101 includes:
  • the initial cluster center obtaining unit 1011 is used to select the same number of data points as the preset number of cluster clusters from a plurality of data point sets to be classified, and use the selected data point as the initial cluster center of each cluster ;
  • the initial clustering unit 1012 is configured to divide the set of data points to be classified according to the difference between each data point in the set of data points to be classified and each initial cluster center to obtain an initial clustering result;
  • the cluster center adjustment unit 1013 is configured to obtain the adjusted cluster center of each cluster according to the initial clustering result
  • the cluster adjustment unit 1014 is configured to divide the set of data points to be classified according to the difference value from the adjusted cluster center according to the adjusted cluster center, until the clustering result remains the same more than the preset number of times The number of times, the cluster cluster corresponding to the preset number of cluster clusters is obtained.
  • the multi-model construction unit 110 is used to obtain data points corresponding to each cluster included in a plurality of clusters, and construct a data point corresponding to each cluster according to the preset current abnormal point ratio and each cluster.
  • the multi-model construction unit 110 includes:
  • the classification parameter obtaining unit 111 is configured to obtain the first parameter and the second parameter of the hyperplane corresponding to the single classification support vector machine of each cluster according to the preset current abnormal point ratio and each cluster;
  • the model acquisition unit 112 is configured to construct a single-class support vector machine for abnormal point detection in a one-to-one correspondence with each cluster according to the first parameter and the second parameter of the hyperplane and the current abnormal point ratio.
  • the normal point center obtaining unit 120 is configured to classify the selected cluster according to the single-class support vector machine and the current abnormal point ratio to obtain the normal point center of the normal category in the classification result.
  • the normal point center obtaining unit 120 includes:
  • the initial classification unit 121 is configured to classify the selected cluster according to the corresponding single-class support vector machine and the current proportion of abnormal points to obtain a classification result corresponding to the selected cluster; wherein, the classification The results include normal category data points and abnormal category data points;
  • the distance average calculation unit 122 is configured to obtain the average value corresponding to the data points of the normal category in the classification result to obtain the initial normal point center;
  • the normal point center adjustment unit 123 is configured to obtain the data point closest to the initial normal point center among the data points of the normal category in the classification result as the normal point center corresponding to the data points of the normal category.
  • the first residual calculation unit 130 is configured to obtain the residual square sum of each data point of the abnormal category in the classification result and the center of the normal point to obtain the current residual square sum.
  • the first ratio update unit 140 is configured to subtract a preset step size from the current abnormal point ratio to update the current abnormal point ratio.
  • the second residual calculation unit 150 is configured to classify the selected cluster according to the single-class support vector machine and the current abnormal point ratio to obtain the data points of the current abnormal category, and obtain each data of the current abnormal category
  • the residual sum of squares between the point and the center of the normal point is taken as the next residual sum of squares.
  • the amplitude calculation unit 160 is configured to divide the difference between the next residual sum of squares and the current residual sum of squares by the step size to obtain the residual variation amplitude.
  • the determining unit 170 is configured to determine whether the residual variation range exceeds a preset variation range threshold.
  • the optimal ratio acquisition unit 180 is configured to, if the residual variation range exceeds the variation range threshold, use the current abnormal point ratio plus the step length as the optimal abnormal point ratio.
  • the device 100 for optimizing the proportion of abnormal points based on clustering and SSE further includes:
  • the second ratio update unit 190 is configured to, if the residual variation range does not exceed the variation range threshold, subtract the step size from the current anomaly point ratio to update the current anomaly point ratio, and use the next residual sum of squares to calculate Update the current residual sum of squares, return to the execution to classify the sample to be classified according to the single-class support vector machine and the current anomaly point ratio to obtain the data points of the current anomaly category, and obtain each data point of the current anomaly category and all
  • the residual sum of squares at the center of the normal point is used as the step of the next residual sum of squares.
  • the optimal classification unit 181 is configured to classify the selected clusters according to the single classification support vector machine and the optimal anomaly point ratio to obtain an optimal classification result.
  • the selected cluster can be classified according to the single-class support vector machine and the optimal anomaly point ratio to obtain the optimal classification result, and The unsupervised classification model with the best classification effect.
  • the device realizes accurate classification of massive data and detection of abnormal points in each classification, and the proportion of abnormal points in the detection process is automatically adjusted and obtained without setting based on experience.
  • the above-mentioned device for optimizing the proportion of abnormal points based on clustering and SSE can be implemented in the form of a computer program, which can be run on a computer device as shown in FIG.
  • FIG. 11 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 is a server, and the server may be an independent server or a server cluster composed of multiple servers.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can execute the method for optimizing the proportion of abnormal points based on clustering and SSE.
  • the processor 502 is used to provide calculation and control capabilities, and support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute the method for optimizing the proportion of abnormal points based on clustering and SSE .
  • the network interface 505 is used for network communication, such as providing data information transmission.
  • the structure shown in FIG. 11 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the processor 502 is configured to run a computer program 5032 stored in a memory to implement the method for optimizing the proportion of abnormal points based on clustering and SSE disclosed in the embodiments of the present application.
  • the embodiment of the computer device shown in FIG. 11 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or less components than those shown in the figure. Or combine certain components, or different component arrangements.
  • the computer device may only include a memory and a processor. In such embodiments, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 11, and will not be repeated here.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the method for optimizing the proportion of abnormal points based on clustering and SSE disclosed in the embodiments of the present application.
  • the storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.
  • a physical, non-transitory storage medium such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.

Abstract

La présente demande concerne un procédé et un dispositif permettant d'optimiser une proportion de points anormaux sur la base d'un regroupement et d'une SSE. Le procédé comprend les étapes consistant à recevoir une collection de points de données devant être classifiés, et à regrouper la collection de points de données devant être classifiés par groupement à K moyennes pour obtenir de multiples groupes ; à obtenir des points de données correspondant à chaque groupe des multiples groupes, et à construire une machine à vecteurs de support de classification unique correspondant à chaque groupe selon une proportion de points anormaux actuelle prédéfinie et chaque groupe ; à ajuster en continu la proportion de points anormaux actuelle jusqu'à ce que la variation résiduelle dépasse un seuil de variation à prendre la proportion de points anormaux actuelle ainsi que la taille de pas en tant que proportion de points anormaux optimale ; et à classifier les groupes sélectionnés en fonction de la machine à vecteurs de support de classification unique et de la proportion de points anormaux optimale pour obtenir un résultat de classification optimal.
PCT/CN2019/117363 2019-01-28 2019-11-12 Procédé et dispositif pour optimiser une proportion de points anormaux sur la base d'un regroupement et d'une sse WO2020155756A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910079217.9A CN109961086A (zh) 2019-01-28 2019-01-28 基于聚类和sse的异常点比例优化方法及装置
CN201910079217.9 2019-01-28

Publications (1)

Publication Number Publication Date
WO2020155756A1 true WO2020155756A1 (fr) 2020-08-06

Family

ID=67023504

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117363 WO2020155756A1 (fr) 2019-01-28 2019-11-12 Procédé et dispositif pour optimiser une proportion de points anormaux sur la base d'un regroupement et d'une sse

Country Status (2)

Country Link
CN (1) CN109961086A (fr)
WO (1) WO2020155756A1 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801137A (zh) * 2021-01-04 2021-05-14 中国石油天然气集团有限公司 一种基于大数据的石油管材质量动态评价方法及系统
CN113780354A (zh) * 2021-08-11 2021-12-10 国网上海市电力公司 调度自动化主站系统遥测数据异常识别方法和装置
CN114077872A (zh) * 2021-11-29 2022-02-22 税友软件集团股份有限公司 一种数据异常检测方法及相关装置
CN116416078A (zh) * 2023-06-09 2023-07-11 济南百思为科信息工程有限公司 用于维修资金账务安全的审计监管方法
CN116781984A (zh) * 2023-08-21 2023-09-19 深圳市华星数字有限公司 一种机顶盒数据优化存储方法
CN116796214A (zh) * 2023-06-07 2023-09-22 南京北极光生物科技有限公司 一种基于差分特征的数据聚类方法
CN116933107A (zh) * 2023-07-24 2023-10-24 水木蓝鲸(南宁)半导体科技有限公司 数据分布边界确定方法、装置、计算机设备和存储介质
CN117520994A (zh) * 2024-01-03 2024-02-06 深圳市活力天汇科技股份有限公司 基于用户画像和聚类技术识别机票异常搜索用户方法及系统
CN116933107B (zh) * 2023-07-24 2024-05-10 水木蓝鲸(南宁)半导体科技有限公司 数据分布边界确定方法、装置、计算机设备和存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961086A (zh) * 2019-01-28 2019-07-02 平安科技(深圳)有限公司 基于聚类和sse的异常点比例优化方法及装置
CN109919185A (zh) * 2019-01-28 2019-06-21 平安科技(深圳)有限公司 基于sse的异常点比例优化方法、装置及计算机设备
CN110458581B (zh) * 2019-07-11 2024-01-16 创新先进技术有限公司 商户业务周转异常的识别方法及装置
CN110990867B (zh) * 2019-11-28 2023-02-07 上海观安信息技术股份有限公司 基于数据库的数据泄露检测模型的建模方法、装置,泄露检测方法、系统
CN111459926A (zh) * 2020-03-26 2020-07-28 广西电网有限责任公司电力科学研究院 一种园区综合能源异常数据识别方法
CN111540202B (zh) * 2020-04-23 2021-07-30 杭州海康威视系统技术有限公司 一种相似卡口确定方法、装置、电子设备及可读存储介质
CN111612085B (zh) * 2020-05-28 2023-07-11 上海观安信息技术股份有限公司 一种对等组中异常点的检测方法及装置
CN111914942A (zh) * 2020-08-12 2020-11-10 烟台海颐软件股份有限公司 一种多表合一用能异常分析方法
WO2022155939A1 (fr) * 2021-01-25 2022-07-28 深圳大学 Procédé, appareil et dispositif de regroupement d'attributs de données, et support de stockage

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389636A (zh) * 2015-12-11 2016-03-09 河海大学 一种低压台区kfcm-svr合理线损预测方法
CN106778908A (zh) * 2017-01-11 2017-05-31 湖南文理学院 一种新异类检测方法与装置
CN108322363A (zh) * 2018-02-12 2018-07-24 腾讯科技(深圳)有限公司 推送数据异常监控方法、装置、计算机设备和存储介质
CN109961086A (zh) * 2019-01-28 2019-07-02 平安科技(深圳)有限公司 基于聚类和sse的异常点比例优化方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389636A (zh) * 2015-12-11 2016-03-09 河海大学 一种低压台区kfcm-svr合理线损预测方法
CN106778908A (zh) * 2017-01-11 2017-05-31 湖南文理学院 一种新异类检测方法与装置
CN108322363A (zh) * 2018-02-12 2018-07-24 腾讯科技(深圳)有限公司 推送数据异常监控方法、装置、计算机设备和存储介质
CN109961086A (zh) * 2019-01-28 2019-07-02 平安科技(深圳)有限公司 基于聚类和sse的异常点比例优化方法及装置

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801137A (zh) * 2021-01-04 2021-05-14 中国石油天然气集团有限公司 一种基于大数据的石油管材质量动态评价方法及系统
CN113780354A (zh) * 2021-08-11 2021-12-10 国网上海市电力公司 调度自动化主站系统遥测数据异常识别方法和装置
CN113780354B (zh) * 2021-08-11 2024-01-23 国网上海市电力公司 调度自动化主站系统遥测数据异常识别方法和装置
CN114077872A (zh) * 2021-11-29 2022-02-22 税友软件集团股份有限公司 一种数据异常检测方法及相关装置
CN116796214A (zh) * 2023-06-07 2023-09-22 南京北极光生物科技有限公司 一种基于差分特征的数据聚类方法
CN116796214B (zh) * 2023-06-07 2024-01-30 南京北极光生物科技有限公司 一种基于差分特征的数据聚类方法
CN116416078B (zh) * 2023-06-09 2023-08-15 济南百思为科信息工程有限公司 用于维修资金账务安全的审计监管方法
CN116416078A (zh) * 2023-06-09 2023-07-11 济南百思为科信息工程有限公司 用于维修资金账务安全的审计监管方法
CN116933107A (zh) * 2023-07-24 2023-10-24 水木蓝鲸(南宁)半导体科技有限公司 数据分布边界确定方法、装置、计算机设备和存储介质
CN116933107B (zh) * 2023-07-24 2024-05-10 水木蓝鲸(南宁)半导体科技有限公司 数据分布边界确定方法、装置、计算机设备和存储介质
CN116781984A (zh) * 2023-08-21 2023-09-19 深圳市华星数字有限公司 一种机顶盒数据优化存储方法
CN116781984B (zh) * 2023-08-21 2023-11-07 深圳市华星数字有限公司 一种机顶盒数据优化存储方法
CN117520994A (zh) * 2024-01-03 2024-02-06 深圳市活力天汇科技股份有限公司 基于用户画像和聚类技术识别机票异常搜索用户方法及系统
CN117520994B (zh) * 2024-01-03 2024-04-19 深圳市活力天汇科技股份有限公司 基于用户画像和聚类技术识别机票异常搜索用户方法及系统
CN117851464B (zh) * 2024-03-07 2024-05-14 济南道图信息科技有限公司 一种用于心理评估的用户行为模式辅助分析方法

Also Published As

Publication number Publication date
CN109961086A (zh) 2019-07-02

Similar Documents

Publication Publication Date Title
WO2020155756A1 (fr) Procédé et dispositif pour optimiser une proportion de points anormaux sur la base d'un regroupement et d'une sse
WO2020155755A1 (fr) Procédé d'optimisation basé sur un groupement spectral destiné à un rapport de points d'anomalie, dispositif et appareil informatique
WO2020155752A1 (fr) Procédé et appareil de vérification de modèle de détection d'observation aberrante et dispositif informatique et support d'informations
WO2020143304A1 (fr) Procédé et appareil d'optimisation de la fonction de perte, dispositif informatique et support de stockage
TWI539298B (zh) 具取樣率決定機制的量測抽樣方法 與其電腦程式產品
WO2021142916A1 (fr) Procédé et appareil d'optimisation de surface portante basés sur un algorithme évolutif assisté par un proxy
US9037518B2 (en) Classifying unclassified samples
WO2022111327A1 (fr) Procédé et appareil de traitement de données de niveau de risque, support de stockage et dispositif électronique
WO2021051529A1 (fr) Procédé, appareil et dispositif destinés à estimer des ressources d'hôtes dématérialisés, et support d'informations
WO2021179544A1 (fr) Procédé et appareil de classification d'échantillons, dispositif informatique et support de stockage
JP2005535130A (ja) 最新のプロセス制御システム内で誤って表された計測データを取り扱う方法、システム、および媒体
JP5733229B2 (ja) 分類器作成装置、分類器作成方法、及びコンピュータプログラム
US8112249B2 (en) System and methods for parametric test time reduction
WO2020155754A1 (fr) Procédé et appareil d'optimisation de proportions aberrantes, et dispositif informatique et support d'informations
KR102117637B1 (ko) 데이터 전처리 장치 및 방법
WO2021169445A1 (fr) Procédé et appareil de recommandation d'informations, dispositif informatique et support de stockage
TWI709932B (zh) 交易指標的監控方法、裝置及設備
WO2021098384A1 (fr) Procédé et appareil de détection d'anomalie de données
WO2018006631A1 (fr) Procédé et système de segmentation automatique au niveau des utilisateurs
KR20190008515A (ko) 개선된 sax 기법 및 rtc 기법을 이용한 공정 모니터링 장치 및 방법
CN114116828A (zh) 多维网络指标的关联规则分析方法、设备和存储介质
CN114881167B (zh) 异常检测方法、装置、电子设备和介质
CN104992050A (zh) 基于统计信号处理的时间序列特性评价的预测模型选择方法
CN105306252A (zh) 一种自动判别服务器故障的方法
CN109257952A (zh) 以减少的数据量进行数据传输的方法、系统和程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19913220

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19913220

Country of ref document: EP

Kind code of ref document: A1