WO2022262869A1 - Procédé et appareil de traitement de données, dispositif de réseau et support de stockage - Google Patents

Procédé et appareil de traitement de données, dispositif de réseau et support de stockage Download PDF

Info

Publication number
WO2022262869A1
WO2022262869A1 PCT/CN2022/099638 CN2022099638W WO2022262869A1 WO 2022262869 A1 WO2022262869 A1 WO 2022262869A1 CN 2022099638 W CN2022099638 W CN 2022099638W WO 2022262869 A1 WO2022262869 A1 WO 2022262869A1
Authority
WO
WIPO (PCT)
Prior art keywords
shortest
tree
data
node
forked
Prior art date
Application number
PCT/CN2022/099638
Other languages
English (en)
Chinese (zh)
Inventor
郑忠斌
王朝栋
彭新
Original Assignee
工业互联网创新中心(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 工业互联网创新中心(上海)有限公司 filed Critical 工业互联网创新中心(上海)有限公司
Publication of WO2022262869A1 publication Critical patent/WO2022262869A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of communications, and in particular to a data processing method, device, network equipment, and storage medium.
  • Some embodiments of the present application provide a data processing method, device, network device, and storage medium.
  • the embodiment of the present application provides a data processing method, including: obtaining the target data set, using the shortest bifurcation tree rough clustering algorithm to perform rough clustering on the target data set, and forming Multiple shortest forked trees; the threshold pruning algorithm based on the rough clustering neighborhood information system is used to prune and merge the shortest forked trees to obtain the simplified shortest forked tree; The outlier detection algorithm calculates the abnormality degree of the data object in the simplified shortest bifurcation tree, and determines and eliminates the abnormal data value in the target data set according to the abnormality degree.
  • the embodiment of the present application also provides a data processing device, including: a clustering module, used to obtain the target data set, use the shortest bifurcation tree rough clustering algorithm to perform rough clustering on the target data set, and according to the rough clustering result A plurality of shortest forked trees are formed; a processing module is used for pruning and merging the shortest forked trees by using a threshold pruning algorithm based on a rough clustering neighborhood information system to obtain a simplified shortest forked tree; a determination module, The outlier detection algorithm is used to calculate the abnormality degree of the data object in the simplified shortest bifurcation tree by using the outlier detection algorithm of the local multi-characteristic factors of the balanced fusion data, and determine and eliminate the abnormal data value in the target data set according to the abnormality degree.
  • a clustering module used to obtain the target data set, use the shortest bifurcation tree rough clustering algorithm to perform rough clustering on the target data set, and according to the rough clustering result A plurality of shortest forked trees are formed;
  • the embodiment of the present application also provides a network device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by at least one processor. Executed by a processor, so that at least one processor can execute the above data processing method.
  • Embodiments of the present application also provide a computer-readable storage medium storing a computer program, and implementing the above-mentioned data processing method when the computer program is executed by a processor.
  • FIG. 1 is a schematic flow diagram of a data processing method provided in the first embodiment of the present application
  • Fig. 2 is a schematic diagram of the algorithm process of the shortest bifurcated tree rough algorithm in the data processing method provided in the first embodiment of the present application;
  • Fig. 3 is an example diagram of search results of primary nodes in the data processing method provided in the first embodiment of the present application
  • FIG. 4 is a schematic diagram of the process of processing the shortest forked tree by using the threshold pruning algorithm of the rough clustering domain information system in the data processing method provided by the first embodiment of the present application;
  • Fig. 5 is a schematic flowchart of an outlier detection algorithm using balanced fusion data local multi-characteristic factors in the data processing method provided by the first embodiment of the present application;
  • Fig. 6 is a schematic diagram of the network mechanism of the improved sparse autoencoder of the data processing method provided in the first embodiment of the present application;
  • FIG. 7 is an exemplary flowchart of a data processing method provided in the first embodiment of the present application.
  • FIG. 8 is a schematic diagram of the module structure of the data processing device provided in the second embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a network device provided in a third embodiment of the present application.
  • the first embodiment of the present application relates to a data processing method, which uses the shortest fork tree rough clustering algorithm to perform rough clustering on the target data set to form multiple shortest fork trees, and then uses the rough clustering neighborhood information system
  • the threshold pruning algorithm prunes and merges the shortest bifurcation tree, and then uses the outlier detection algorithm of the local multi-characteristic factors of the balanced fusion data to calculate the anomaly degree of the data object in the shortest bifurcation tree, and determines and eliminates it according to the anomaly degree of the data object Unusual data value.
  • Abnormal data values in the original data can be eliminated to improve the efficiency of data analysis and the accuracy of decision-making.
  • the algorithm Since the algorithm is used to automatically analyze the data of the target data set, the efficiency of data analysis can be improved; at the same time, due to the outlier detection algorithm of the local multi-characteristic factors of the balanced fusion data, the local relative proximity is introduced to replace the standard local anomalous factors.
  • the local accessibility density of data objects adjust the ratio of neighborhood dispersion and distance calculation to a calculation method suitable for rough clustering, and introduce the coefficient of variation to represent the degree of dispersion within a class, so the abnormality of data objects can be accurately and quantitatively analyzed, so that according to abnormal It can accurately determine and eliminate abnormal data values in the original data (ie, the target data set), and improve the accuracy of analysis results and decision-making.
  • the execution subject of the data processing method provided in the embodiments of the present application may be a server, wherein the server may be implemented by a single server or a server cluster composed of multiple servers, and the following uses the server as an example for illustration.
  • S101 Obtain the target data set, perform rough clustering on the target data set by using the shortest fork tree rough clustering algorithm, and form multiple shortest fork trees according to the rough clustering results.
  • the target data set may be real-time data or offline data, such as offline data of an enterprise.
  • the target data set refers to data at a certain moment.
  • S101 may include: determining the source node in the target data set, searching for the nearest node of the source node, and using the node closest to the source node as the initial node; The node is used as the starting point, and the descendant node set is searched with the adaptive node spacing as the neighborhood search radius.
  • the new node is used as the current parent node, and the adaptive node spacing is used as the neighborhood
  • the search radius searches the descendant node set until there is no new node in the neighborhood search radius, ends the search and stores all nodes and node distances from the source node to the last generation node, according to all the nodes between the source node and the last generation node
  • FIG. 2 is a schematic diagram of the algorithm process of the shortest bifurcated tree rough clustering algorithm in the data processing method provided by the embodiment of the present application.
  • the following uses a specific process as an example to illustrate:
  • the server collects the offline data of the enterprise as the target data set, assumes all data objects in the target data set as outliers and identifies them as source nodes, and the number of data objects in the offline data is equal to the assumed number of source nodes. Store the relevant attributes (loca, value) of the source node, where loca is the location of the data object, and value is the data value.
  • the global search strategy mainly includes calculating the distance between adaptive nodes and determining adjacent nodes. Take the source node as the starting point to search for the next two generations of node data sets as an example:
  • the first-generation node contains three attributes (loca, value,
  • the first-generation node Take the first-generation node as the starting point of the next level, and use the distance
  • the search The result is shown in Figure 3.
  • the set of descendant nodes of the first-generation node is not limited by the number of nodes, and the data objects within the neighborhood search radius belong to its child nodes, but the principle of uniqueness must be followed.
  • the principle of uniqueness refers to Between two adjacent generations of the same level, only next-gener j can be generated by searching last-gener i .
  • the mapping relationship can be one-to-one or one-to-many, but the data between two generations cannot overlap, that is: last-gener i ⁇ next-gener j and
  • the shortest bifurcated tree includes two types of data: one is the source node and its searched descendant node set; the other is the distance between corresponding nodes between all adjacent generations that form the shortest bifurcated tree gather.
  • the outliers in the target data set have the characteristics of low density of surrounding data objects and large spacing in their neighborhood, the dispersion between local outliers and adjacent points is large.
  • the distance between different levels that is, the adaptive node spacing
  • S102 Use a threshold pruning algorithm based on the rough clustering neighborhood information system to prune and merge the shortest forked tree to obtain a simplified shortest forked tree.
  • FIG. 4 is a schematic diagram of the process of processing the shortest forked tree by using the threshold pruning algorithm of the rough clustering neighborhood information system in the data processing method provided by the embodiment of the present application.
  • S102 may include: according to the attributes of each data object in the shortest forked tree, combine the branches containing shared nodes into a shortest forked tree structure, and cut off the branches that completely intersect in the shortest forked tree to obtain a simplified The shortest bifurcation tree of .
  • cutting off branches whose scores are less than or equal to the deviation threshold means: cutting off the data object whose Dist value is less than or branches equal to the deviation threshold.
  • the pruning of branches whose scores are less than or equal to the deviation threshold according to the deviation threshold formula refers to the pruning of weak weight branch clusters in the shortest bifurcated tree whose scores are lower than the deviation threshold according to the deviation threshold formula.
  • S103 Calculate the abnormality degree of the data object in the streamlined shortest bifurcation tree by using the outlier detection algorithm of the local multi-characteristic factors of the balanced fusion data, and determine and eliminate the abnormal data value in the target data set according to the abnormality degree.
  • numerically processing the data in the simplified shortest bifurcation tree, that is, the data standardization process, where T o Represents the shortest bifurcation tree branch after simplification, T o-stand represents the T o branch after numerical processing; according to N dis (x) Calculate the distance between each node in the same shortest bifurcation tree, where N dis (x) is the calculation result of the distance between each node of the shortest bifurcation tree, x is the specified data object, x i is the shortest bifurcation For other data objects in the tree class, K represents the number of data objects in the shortest forked tree class, and exp(1) represents a constant value with e as the base and an exponent of 1; respectively calculate the shortest forked tree according to the following formula The coefficient of variation of the data:
  • T i represents the sum of the distances of all nodes in any shortest forked tree cluster class
  • x c represents the distance of each node in the shortest forked tree corresponding to T i
  • represents the number of nodes contained in the cluster class
  • is the number of the shortest forked tree
  • N std (T i ) is the standard deviation of the shortest forked tree cluster
  • N mean (T i ) is the average value of the class
  • N cv (T i ) is the coefficient of variation
  • FIG. 5 is a schematic flow chart of the data processing method provided by the embodiment of the present application using an outlier detection algorithm using balanced fusion data local multi-feature factors.
  • the local relative proximity (Local Relative Proximity, LRP) is introduced to the standard local outlier factor (Local Outlier Factor, LOF) to replace the local reachability density (Local Reachability) of the data object.
  • LRP Local Relative Proximity
  • LOF Local Outlier Factor
  • Density, LRD adjust the neighborhood dispersion degree and distance calculation ratio to the calculation method suitable for rough clustering, and introduce the variation coefficient to represent the dispersion degree within the class, so it can accurately and quantitatively analyze the abnormality of data objects and eliminate the abnormal objects judged ( i.e. outlier data values).
  • after S103 it also includes: using an improved sparse autoencoder to reduce the dimensionality of the target data set, wherein the improved sparse autoencoder uses a sparse rule operator instead of KL relative entropy as a sparsity constraint term, and the L2 norm is used as the regular term.
  • the sparse autoencoder adopts increasing neuron activity in the hidden layer
  • the sparsity limit of is used to represent the activation of the hidden neuron j-ac of the autoencoder neural network given the input X.
  • the average activation of the hidden neuron j-ac of the sparse autoencoder is defined as:
  • the index value j-ac represents the position label of each neuron
  • H represents the number of neurons in the input layer
  • h represents the index of each neuron in the input layer.
  • the loss function of the original sparse autoencoder is generally represented by the mean square error and on this basis, the KL divergence is added as a sparsity constraint.
  • the specific formula is as follows:
  • f'(z q ) represents the derivative of the output layer z in the neural network
  • q represents the number of neurons in the output layer.
  • the modified sparse autoencoder constructs the following objective loss function: Among them, ⁇ 1 is the weight of the sparse penalty item, ⁇ 2 is the weight decay coefficient, S 2 represents the number of neurons in the hidden layer, W s represents the weight coefficient of all hidden layer neurons in the neural network, b represents the bias term of the neural network, s Represents the index of the hidden layer neuron, and its range is [1,S 2 ], J(W,b) represents the initial loss function item of the sparse autoencoder, and J sparse (W,b) represents the improved sparse autoencoder
  • the target loss function of , y represents the real value
  • h w, h (x) represents the predicted value of the neural network whose input is x
  • FIG. 6 is a schematic diagram of a network mechanism of an improved sparse autoencoder of the data processing method provided in the embodiment of the present application.
  • FIG. 7 is an example flow diagram of a data processing method provided in an embodiment of the present application.
  • the performance of the algorithm coefficients can be improved; using the L2 norm as the regular item can balance the weight of the polynomial components and improve the sparse autoencoder to prevent overfitting when processing data.
  • using an improved sparse autoencoder to reduce the data dimensionality of the data that has been detected by outliers can reduce data redundancy and improve the simplicity and reliability of data.
  • the data processing method provided by the embodiment of the present application uses the rough clustering algorithm of the shortest forked tree to perform rough clustering on the target data set to form multiple shortest forked trees, and then uses the threshold pruning algorithm of the rough clustering neighborhood information system Pruning and merging the shortest bifurcation tree, and then using the outlier detection algorithm of balanced fusion data local multi-characteristic factors to calculate the abnormality degree of the data object in the shortest bifurcation tree, determine and eliminate the abnormal data value according to the abnormality degree of the data object.
  • the algorithm Since the algorithm is used to automatically analyze the data of the target data set, the efficiency of data analysis can be improved; at the same time, due to the outlier detection algorithm of the local multi-characteristic factors of the balanced fusion data, the local relative proximity is introduced to replace the standard local anomalous factors.
  • the local accessibility density of data objects adjust the ratio of neighborhood dispersion and distance calculation to a calculation method suitable for rough clustering, and introduce the coefficient of variation to represent the degree of dispersion within a class, so the abnormality of data objects can be accurately and quantitatively analyzed, so that according to abnormal It can accurately determine and eliminate abnormal data values in the original data (ie, the target data set), and improve the accuracy of analysis results and decision-making.
  • the second embodiment of the present application relates to a data processing device 200, as shown in FIG. 8 , comprising: a clustering module 201, a processing module 202, and a determination module 203.
  • a clustering module 201 a processing module 202
  • a determination module 203 a processing module 203.
  • the clustering module 201 is used to obtain the target data set, perform rough clustering on the target data set by using the shortest fork tree rough clustering algorithm, and form multiple shortest fork trees according to the rough clustering results;
  • the processing module 202 is configured to use a threshold pruning algorithm based on a rough clustering neighborhood information system to prune and merge the shortest forked tree to obtain a simplified shortest forked tree;
  • the determination module 203 is used to calculate the anomaly degree of the data object in the simplified shortest bifurcated tree by adopting the outlier detection algorithm of balanced and fused data local multi-characteristic factors, and determine and eliminate the target data set according to the anomalous degree abnormal data value.
  • the data processing device 200 provided in the embodiment of the present application further includes a dimensionality reduction module, wherein the dimensionality reduction module is used to: use an improved sparse autoencoder to perform dimensionality reduction on the target data set, wherein the improved The sparse autoencoder uses the sparse regular operator instead of the KL relative entropy as the sparsity constraint item, and uses the L2 norm as the regular item.
  • the dimensionality reduction module is used to: use an improved sparse autoencoder to perform dimensionality reduction on the target data set, wherein the improved The sparse autoencoder uses the sparse regular operator instead of the KL relative entropy as the sparsity constraint item, and uses the L2 norm as the regular item.
  • the dimensionality reduction module is also used for:
  • the ⁇ 1 is the weight of the sparse penalty item
  • the ⁇ 2 is the weight attenuation coefficient
  • S 2 represents the number of neurons in the hidden layer
  • W s represents the weight coefficient of all hidden layer neurons in the neural network
  • b represents the bias of the neural network Set item
  • s represents the index of the hidden layer neuron, and its range is [1,S 2 ]
  • J(W,b) represents the initial loss function item of the sparse autoencoder
  • J sparse (W,b) represents the improved Target loss function for sparse autoencoders;
  • determination module 203 is specifically used for:
  • N dis (x) is the calculation result of the distance between each node of the shortest bifurcated tree
  • x is the specified data object
  • x i is the shortest point
  • K represents the number of data objects in the shortest fork tree class
  • exp(1) represents taking e as the base
  • the exponent is a constant value of 1
  • T i represents the sum of the distances of all nodes in any shortest forked tree cluster class
  • x c represents the distance of each node in the shortest forked tree corresponding to T i
  • represents the number of nodes contained in the cluster class
  • is the number of the shortest forked tree
  • N std (T i ) is the standard deviation of the shortest forked tree cluster
  • N mean (T i ) is the average value of the class
  • N cv (T i ) is the coefficient of variation.
  • the MDILAF is used as the abnormality degree of the data object, and the abnormal data values in the target data set are determined and eliminated according to the abnormality degree, wherein, LRP( xi ) is the local relative proximity between the other data in the class except x degree, N(x) is the shortest bifurcated tree of data object x,
  • clustering module 201 is specifically used for:
  • the branches containing the shared nodes are combined into a shortest forked tree structure, and the branches that are completely intersected in the shortest forked tree are cut off to obtain the simplified shortest branch fork tree.
  • processing module 202 is also used for:
  • the branch whose score is less than or equal to the deviation threshold is Refers to: pruning branches whose Dist value of the data object is less than or equal to the deviation threshold.
  • this embodiment is a device embodiment corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment.
  • the relevant technical details mentioned in the first embodiment are still valid in this embodiment, and will not be repeated here in order to reduce repetition.
  • the relevant technical details mentioned in this implementation manner can also be applied in the first implementation manner.
  • modules involved in this embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
  • units that are not closely related to solving the technical problems proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • the third embodiment of the present application relates to a network device. As shown in FIG. 9 , it includes at least one processor 301; and a memory 302 communicatively connected to at least one processor 301; The instructions executed by the processor 301 are executed by at least one processor 301, so that the at least one processor 301 can execute the above data processing method.
  • the memory 302 and the processor 301 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 301 and various circuits of the memory 302 together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor 301 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 301 .
  • the processor 301 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management and other control functions. And the memory 302 may be used to store data used by the processor 301 when performing operations.
  • the fourth embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the computer program is executed by the processor, the above-mentioned method embodiments are realized.
  • the program is stored in a storage medium, and includes several instructions to make a device (which can It is a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Discrete Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un procédé de traitement de données dans le domaine technique des communications, comprenant : l'acquisition d'un ensemble de données cible, la réalisation d'un regroupement grossier sur l'ensemble de données cible à l'aide d'un algorithme de regroupement grossier par arbre des plus courts chemins, et la formation d'une pluralité d'arbres des plus courts chemins selon un résultat de regroupement grossier; l'élagage et la fusion des arbres des plus courts chemins à l'aide d'un algorithme d'élagage à seuil sur la base d'un système d'informations de voisinage de regroupement grossier pour obtenir un arbre des plus courts chemins simplifié; et le calcul d'une anomalie d'un objet de données dans l'arbre des plus courts chemins simplifié à l'aide d'un algorithme de détection de valeur anormale pour des facteurs multi-caractéristiques locaux de données de fusion équilibrée, et la détermination et la suppression d'une valeur de données anormale dans l'ensemble de données cible selon l'anomalie. L'invention concerne également un appareil de traitement de données, un dispositif de réseau et un support de stockage.
PCT/CN2022/099638 2021-06-18 2022-06-17 Procédé et appareil de traitement de données, dispositif de réseau et support de stockage WO2022262869A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110678862.X 2021-06-18
CN202110678862.XA CN113420804B (zh) 2021-06-18 2021-06-18 数据处理方法、装置、网络设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022262869A1 true WO2022262869A1 (fr) 2022-12-22

Family

ID=77789079

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/099638 WO2022262869A1 (fr) 2021-06-18 2022-06-17 Procédé et appareil de traitement de données, dispositif de réseau et support de stockage

Country Status (2)

Country Link
CN (1) CN113420804B (fr)
WO (1) WO2022262869A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272216A (zh) * 2023-11-22 2023-12-22 中国建材检验认证集团湖南有限公司 一种自动流量监测站和人工水尺观测站的数据分析方法
CN117370331A (zh) * 2023-12-08 2024-01-09 河北建投水务投资有限公司 小区用水总数据清洗方法及装置、终端设备、存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420804B (zh) * 2021-06-18 2024-06-18 工业互联网创新中心(上海)有限公司 数据处理方法、装置、网络设备及存储介质
CN114742178B (zh) * 2022-06-10 2022-11-08 航天亮丽电气有限责任公司 一种通过mems六轴传感器进行非侵入式压板状态监测的方法
CN115202661B (zh) * 2022-09-15 2022-11-29 深圳大学 一种具有层次结构布局的混合生成方法及相关设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444247A (zh) * 2020-06-17 2020-07-24 北京必示科技有限公司 一种基于kpi指标的根因定位方法、装置及存储介质
CN111985837A (zh) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 基于层次聚类的风险分析方法、装置、设备及存储介质
CN112800148A (zh) * 2021-02-04 2021-05-14 国网福建省电力有限公司 一种基于聚类特征树和离群度量化的散乱污企业研判方法
CN113420804A (zh) * 2021-06-18 2021-09-21 工业互联网创新中心(上海)有限公司 数据处理方法、装置、网络设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6871201B2 (en) * 2001-07-31 2005-03-22 International Business Machines Corporation Method for building space-splitting decision tree

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444247A (zh) * 2020-06-17 2020-07-24 北京必示科技有限公司 一种基于kpi指标的根因定位方法、装置及存储介质
CN111985837A (zh) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 基于层次聚类的风险分析方法、装置、设备及存储介质
CN112800148A (zh) * 2021-02-04 2021-05-14 国网福建省电力有限公司 一种基于聚类特征树和离群度量化的散乱污企业研判方法
CN113420804A (zh) * 2021-06-18 2021-09-21 工业互联网创新中心(上海)有限公司 数据处理方法、装置、网络设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272216A (zh) * 2023-11-22 2023-12-22 中国建材检验认证集团湖南有限公司 一种自动流量监测站和人工水尺观测站的数据分析方法
CN117272216B (zh) * 2023-11-22 2024-02-09 中国建材检验认证集团湖南有限公司 一种自动流量监测站和人工水尺观测站的数据分析方法
CN117370331A (zh) * 2023-12-08 2024-01-09 河北建投水务投资有限公司 小区用水总数据清洗方法及装置、终端设备、存储介质
CN117370331B (zh) * 2023-12-08 2024-02-20 河北建投水务投资有限公司 小区用水总数据清洗方法及装置、终端设备、存储介质

Also Published As

Publication number Publication date
CN113420804B (zh) 2024-06-18
CN113420804A (zh) 2021-09-21

Similar Documents

Publication Publication Date Title
WO2022262869A1 (fr) Procédé et appareil de traitement de données, dispositif de réseau et support de stockage
US11210144B2 (en) Systems and methods for hyperparameter tuning
US20200342007A1 (en) Path generation and selection tool for database objects
US9684874B2 (en) Parallel decision or regression tree growing
CN113110866B (zh) 一种数据库变更脚本的评估方法及装置
CN108804473B (zh) 数据查询的方法、装置和数据库系统
WO2018107128A9 (fr) Systèmes et procédés d'automatisation de flux de travaux analytiques d'apprentissage machine de science de données
US20050278139A1 (en) Automatic match tuning
AU2017246552A1 (en) Self-service classification system
US20030208284A1 (en) Modular architecture for optimizing a configuration of a computer system
CN110147357A (zh) 一种基于大数据环境下的多源数据聚合抽样方法及系统
CN112711591B (zh) 基于知识图谱的字段级的数据血缘确定方法及装置
CN103513983A (zh) 用于预测性警报阈值确定工具的方法和系统
US10417580B2 (en) Iterative refinement of pathways correlated with outcomes
CN114116829A (zh) 异常数据分析方法、异常数据分析系统和存储介质
Asadifar et al. Semantic association rule mining: a new approach for stock market prediction
CN111125199B (zh) 一种数据库访问方法、装置及电子设备
Sun Study on application of data mining technology in university computer network educational administration management system
Yan et al. A clustering algorithm for multi-modal heterogeneous big data with abnormal data
WO2012133941A1 (fr) Procédé destiné à la mise en correspondance d'éléments dans des schémas de bases de données au moyen d'un réseau de bayes
US20220172105A1 (en) Efficient and scalable computation of global feature importance explanations
US11853945B2 (en) Data anomaly forecasting from data record meta-statistics
US20220147515A1 (en) Systems, methods, and program products for providing investment expertise using a financial ontology framework
Karegar et al. Data-mining by probability-based patterns
CN115996169A (zh) 一种网络故障分析方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22824341

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22824341

Country of ref document: EP

Kind code of ref document: A1