CN113420804A - Data processing method, device, network equipment and storage medium - Google Patents

Data processing method, device, network equipment and storage medium Download PDF

Info

Publication number
CN113420804A
CN113420804A CN202110678862.XA CN202110678862A CN113420804A CN 113420804 A CN113420804 A CN 113420804A CN 202110678862 A CN202110678862 A CN 202110678862A CN 113420804 A CN113420804 A CN 113420804A
Authority
CN
China
Prior art keywords
shortest
data
node
tree
bifurcation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110678862.XA
Other languages
Chinese (zh)
Inventor
郑忠斌
王朝栋
彭新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial Internet Innovation Center Shanghai Co ltd
Original Assignee
Industrial Internet Innovation Center Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial Internet Innovation Center Shanghai Co ltd filed Critical Industrial Internet Innovation Center Shanghai Co ltd
Priority to CN202110678862.XA priority Critical patent/CN113420804A/en
Publication of CN113420804A publication Critical patent/CN113420804A/en
Priority to PCT/CN2022/099638 priority patent/WO2022262869A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Discrete Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to the technical field of communication, and discloses a data processing method, which comprises the following steps: acquiring a target data set, carrying out rough clustering on the target data set by adopting a shortest bifurcation tree rough clustering algorithm, and forming a plurality of shortest bifurcation trees according to a rough clustering result; pruning and combining the shortest bifurcation trees by adopting a threshold pruning algorithm based on a rough clustering neighborhood information system to obtain the simplified shortest bifurcation trees; and calculating the abnormal degree of the data object in the simplified shortest bifurcation tree by adopting an abnormal value detection algorithm of the balanced fusion data local multi-feature factors, and determining and removing the abnormal data value in the target data set according to the abnormal degree. The embodiment of the invention also discloses a data processing device, network equipment and a storage medium. The data processing method, the data processing device, the network equipment and the storage medium disclosed by the embodiment of the invention can eliminate abnormal data values in original data, and improve the efficiency of data analysis and the accuracy of decision.

Description

Data processing method, device, network equipment and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a data processing method, an apparatus, a network device, and a storage medium.
Background
When an enterprise makes a decision, if data is analyzed first, the decision can be made more scientifically and accurately. On one hand, however, due to the development of information technology, enterprises generate more and more data, and if the enterprises analyze the data, the enterprises often need to face a large amount of data when making decisions; on the other hand, most enterprises still rely on experience or traditional data analysis means, when a large amount of data is analyzed by the data analysis means to obtain potential rules or changes of the data, the analysis efficiency is low, and the analysis result is not accurate enough due to differences in subjective aspects, so that the decision accuracy is influenced. Particularly, if an abnormal data value exists in the original data and is not removed during data analysis, the data analysis may have an irreversible deviation, which seriously affects the accuracy of the analysis result and causes a great decision error.
Disclosure of Invention
The embodiment of the invention aims to provide a data processing method, a data processing device, network equipment and a storage medium, which can eliminate abnormal data values in original data and improve the efficiency of data analysis and the accuracy of decision.
In order to solve the above technical problem, an embodiment of the present invention provides a data processing method, including: acquiring a target data set, carrying out rough clustering on the target data set by adopting a shortest bifurcation tree rough clustering algorithm, and forming a plurality of shortest bifurcation trees according to a rough clustering result; pruning and combining the shortest bifurcation trees by adopting a threshold pruning algorithm based on a rough clustering neighborhood information system to obtain the simplified shortest bifurcation trees; and calculating the abnormal degree of the data object in the simplified shortest bifurcation tree by adopting an abnormal value detection algorithm of the balanced fusion data local multi-feature factors, and determining and removing the abnormal data value in the target data set according to the abnormal degree.
An embodiment of the present invention further provides a data processing apparatus, including: the clustering module is used for acquiring a target data set, carrying out rough clustering on the target data set by adopting a shortest bifurcation tree rough clustering algorithm, and forming a plurality of shortest bifurcation trees according to a rough clustering result; the processing module is used for pruning and combining the shortest bifurcation trees by adopting a threshold pruning algorithm based on a rough clustering neighborhood information system to obtain the simplified shortest bifurcation trees; and the determining module is used for calculating the abnormal degree of the data object in the simplified shortest bifurcation tree by adopting an abnormal value detection algorithm of the balanced fusion data local multi-feature factor, and determining and removing the abnormal data value in the target data set according to the abnormal degree.
An embodiment of the present invention further provides a network device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the data processing method.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the data processing method described above.
Compared with the related technology, the embodiment of the invention adopts the shortest bifurcation tree rough clustering algorithm to carry out rough clustering on a target data set to form a plurality of shortest bifurcation trees, then adopts the threshold pruning algorithm of a rough clustering neighborhood information system to carry out pruning and merging on the shortest bifurcation trees, then calculates the abnormal degree of the data object in the shortest bifurcation trees by using the abnormal value detection algorithm of the balanced fusion data local multi-feature factors, and determines and eliminates the abnormal data value according to the abnormal degree of the data object. Because the data of the target data set is automatically analyzed by adopting the algorithm, the data analysis efficiency can be improved; meanwhile, due to the abnormal value detection algorithm of the balanced fusion data local multi-feature factor, local relative proximity is introduced into the standard local abnormal factor to replace local reachable density of the data object, the calculation ratio of neighborhood dispersion degree and distance is adjusted to a calculation mode suitable for rough clustering, and variation coefficient representation intra-class dispersion degree is introduced, so that the abnormal degree of the data object can be accurately and quantitatively analyzed, abnormal data values in original data (namely a target data set) are determined and removed according to the abnormal degree, and the accuracy of analysis results and decision is improved.
In addition, after the abnormal value detection algorithm of the balanced fusion data local multi-feature factor is adopted to calculate the abnormal degree of the data object in the simplified shortest bifurcation tree, and the abnormal data value in the target data set is determined and removed according to the abnormal degree of the data object, the method further comprises the following steps: and adopting an improved sparse autoencoder to perform dimensionality reduction on the target data set, wherein the improved sparse autoencoder adopts a sparse rule operator to replace KL relative entropy as a sparsity constraint item, and adopts an L2 norm as a regular item. By replacing KL relative entropy with a sparse rule operator as a sparsity constraint term, the coefficient performance of the algorithm can be improved; by adopting the L2 norm as a regular term, the polynomial component weight can be balanced, and the capability of preventing overfitting when the sparse self-encoder processes data is improved; meanwhile, the data dimension reduction is carried out on the data subjected to abnormal value detection by adopting the improved sparse self-encoder, so that the data redundancy can be reduced, and the simplicity and reliability of the data are improved.
In addition, the improved sparse autoencoder is adopted to reduce the dimension of the target data set, and the method comprises the following steps: the following objective loss function is constructed from the improved sparse autoencoder:
Figure BDA0003122033220000021
wherein λ is1For sparse penalty term weight, λ2As a weighted attenuation coefficient, s2Representing the number of hidden layer neurons, W representing the weight coefficient of the neural network, b representing the bias term of the neural network, J representing the index of the neuron, J (W, b) representing the initial loss function term of the sparse self-encoder, Jspare(W, b) represents an objective loss function of the improved sparse self-encoder; and reducing the dimension of the target data set according to the target loss function.
In addition, the abnormal value detection algorithm of the balanced fusion data local multi-feature factor is adopted to calculate the abnormal degree of the data object in the simplified shortest bifurcation tree, and the abnormal data value in the target data set is determined and removed according to the abnormal degree of the data object, and the abnormal value detection method comprises the following steps: according to Ti-stand=Ti+|min(Ti) I standardizing data in the reduced shortest bifurcation treeMelting; according to
Figure BDA0003122033220000031
Calculating the distance between each node in the same shortest bifurcation tree, wherein Ndis(x) Is the calculation result of the distance between each node of the shortest bifurcation tree, x is a specified data object, xiK represents the number of data objects in the shortest forked tree class for other data objects in the shortest forked tree class, exp (1) represents a constant value with e as a base and an index of 1; calculating the variation coefficient of the data in the shortest bifurcation tree according to the following formulas respectively:
Figure BDA0003122033220000032
Figure BDA0003122033220000033
wherein T represents the sum of distances of all nodes in any shortest bifurcation tree cluster class, i represents an index label of the T, and xqRepresenting the distance of each node in the shortest bifurcation tree, k representing the number of nodes contained in the cluster class, j being the number of the shortest bifurcation tree, Nstd(T) standard deviation of class, Nmean(T) is the mean value of the class, Ncv(T) is the coefficient of variation; according to
Figure BDA0003122033220000034
Calculating local relative proximity of data objects in the class; calculating according to local relative proximity
Figure BDA0003122033220000035
Taking the MDILAF as the abnormal degree of the data object, and determining and eliminating abnormal data values in the target data set according to the abnormal degree, wherein N isxIs the shortest spanning tree class for data object x, | n (x) | is the sum of the distances of all the remaining data objects in the class. Due to an abnormal value detection algorithm for the local multi-feature factor of the balanced fusion data, Local Relative Proximity (LRP) is introduced to a standard local abnormal factor (LOF) to replace a data objectLocal Reachable Density (LRD), the calculation ratio of neighborhood discrete degree and distance is adjusted to a calculation mode suitable for rough clustering, and the discrete degree in the variable coefficient representation class is introduced, so that the abnormal degree of the data object can be accurately and quantitatively analyzed, and the judged abnormal data value is removed.
In addition, the method adopts a shortest bifurcation tree rough clustering algorithm to carry out rough clustering on the target data set, and forms a plurality of shortest bifurcation trees according to the rough clustering result, which comprises the following steps: determining a source node in a target data set; searching the nearest node of the source node, and taking the nearest node of the source node as a primary node; starting with an initial generation node as a current father node, circularly executing search of a descendant node set by taking the current father node as a starting point and taking the adaptive node spacing as a neighborhood search radius, if a new node exists in the neighborhood search radius, taking the new node as the current father node, continuing to search the descendant node set by taking the adaptive node spacing as the neighborhood search radius until no new node exists in the neighborhood search radius, finishing the search and storing all nodes and node distances from a source node to a last generation node, and forming a shortest bifurcation tree according to all nodes from the source node to the last generation node, wherein the node distances are the distance sets of the nodes at the same level and the descendant node, and the adaptive node spacing is as follows: dist ═ arg min (last-gene)i,next-generj)),last-generiIs a parent of two adjacent generations, next-generatorjAre the offspring in two adjacent generations.
In addition, a threshold pruning algorithm based on a rough clustering neighborhood information system is adopted to prune and combine the shortest bifurcation tree to obtain a simplified shortest bifurcation tree, which comprises the following steps: and combining the branches containing the shared nodes into a shortest branched tree structure according to the attribute of each data object in the shortest branched tree, and cutting off the completely intersected branches in the shortest branched tree to obtain the simplified shortest branched tree. By pruning the data objects and complete intersections in the shortest bifurcation tree of the rough clustering and combining the branches containing the shared nodes, the data structure of the shortest bifurcation tree can be further simplified, and the further processing of subsequent data is facilitated.
In addition, after combining the branches containing the shared nodes into a shortest bifurcation tree structure and cutting off the completely intersected branches in the shortest bifurcation tree, the method further comprises the following steps: calculating the median and average of the sum of the distances of each data object in the shortest bifurcation tree according to the Dist attribute of each data object in the shortest bifurcation tree, and cutting off branches with the fraction less than or equal to a deviation threshold value according to a deviation threshold value formula, wherein the deviation threshold value formula is as follows: DEV is mean + mean-mean, DEV is the deviation threshold, mean is the mean, and mean is the median. And the weak weight branch clusters with the scores lower than the deviation threshold value in the shortest bifurcation tree are cut out by a deviation threshold value formula, so that the data structure of the shortest bifurcation tree can be further simplified, and the further processing of subsequent data is facilitated.
Drawings
One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.
FIG. 1 is a schematic flow chart of a data processing method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of an algorithm process of a shortest bifurcation tree rough algorithm in the data processing method according to the first embodiment of the present invention;
fig. 3 is an exemplary diagram of a search result of a primary node in the data processing method according to the first embodiment of the present invention;
FIG. 4 is a schematic diagram of a process of processing a shortest bifurcation tree by using a threshold pruning algorithm of a rough clustering domain information system in the data processing method according to the first embodiment of the present invention;
FIG. 5 is a schematic flowchart of an abnormal value detection algorithm using a local multi-feature factor of balanced fusion data in the data processing method according to the first embodiment of the present invention;
FIG. 6 is a schematic diagram of a network mechanism of an improved sparse autoencoder of the data processing method provided by the first embodiment of the present invention;
FIG. 7 is a flowchart illustrating a data processing method according to a first embodiment of the present invention;
fig. 8 is a schematic block diagram of a data processing apparatus according to a second embodiment of the present invention;
fig. 9 is a schematic structural diagram of a network device according to a third embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
The first embodiment of the invention relates to a data processing method, wherein a plurality of shortest bifurcation trees are formed by carrying out rough clustering on a target data set by adopting a shortest bifurcation tree rough clustering algorithm, then the shortest bifurcation trees are pruned and merged by adopting a threshold pruning algorithm of a rough clustering neighborhood information system, the abnormal degree of a data object in the shortest bifurcation trees is calculated by utilizing an abnormal value detection algorithm of balanced fusion data local multi-feature factors, and an abnormal data value is determined and removed according to the abnormal degree of the data object. Because the data of the target data set is automatically analyzed by adopting the algorithm, the data analysis efficiency can be improved; meanwhile, due to the abnormal value detection algorithm of the balanced fusion data local multi-feature factor, local relative proximity is introduced into the standard local abnormal factor to replace local reachable density of the data object, the calculation ratio of neighborhood dispersion degree and distance is adjusted to a calculation mode suitable for rough clustering, and variation coefficient representation intra-class dispersion degree is introduced, so that the abnormal degree of the data object can be accurately and quantitatively analyzed, abnormal data values in original data (namely a target data set) are determined and removed according to the abnormal degree, and the accuracy of analysis results and decision is improved.
It should be noted that the execution main body of the data processing method provided by the embodiment of the present invention may be a server, where the server may be implemented by a single server or a server cluster composed of multiple servers, and the following description takes the server as an example.
A specific flow of the data processing method provided by the embodiment of the present invention is shown in fig. 1, and includes the following steps:
s101: and acquiring a target data set, carrying out rough clustering on the target data set by adopting a shortest bifurcation tree rough clustering algorithm, and forming a plurality of shortest bifurcation trees according to a rough clustering result.
The target data set may be real-time data or offline data, for example, offline data of an enterprise, and when the target data set is real-time data, the target data set refers to data at a certain time.
Specifically, S101 may include: determining a source node in a target data set, searching a nearest node of the source node, and taking the nearest node of the source node as a primary node; starting with an initial generation node as a current father node, circularly executing search of a descendant node set by taking the current father node as a starting point and taking the adaptive node spacing as a neighborhood search radius, if a new node exists in the neighborhood search radius, taking the new node as the current father node, continuing to search the descendant node set by taking the adaptive node spacing as the neighborhood search radius until no new node exists in the neighborhood search radius, finishing the search and storing all nodes and node distances from a source node to a last generation node, and forming a shortest bifurcation tree according to all nodes from the source node to the last generation node, wherein the node distances are the distance sets of the nodes at the same level and the descendant node, and the adaptive node spacing is as follows: dist ═ arg min (last-gene)i,next-generj)),last-generiIs a parent of two adjacent generations, next-generatorjIs a descendant in two adjacent generations, wherein European _ dist represents a Euclidean distance.
Please refer to fig. 2, which is a schematic diagram of an algorithm process of the shortest bifurcation tree rough clustering algorithm in the data processing method according to the embodiment of the present invention, and a specific process is described as an example below:
1. the method comprises the steps that a server side collects offline data of an enterprise as a target data set, all data objects in the target data set are assumed to be abnormal values and are considered as source nodes, the number of the data objects in the offline data is equal to the assumed number of the source nodes, and meanwhile relevant attributes (loca, value) of the source nodes are stored, wherein loca is the position of the data objects, and value is the data value.
2. When global search is carried out, the global search strategy mainly comprises the steps of calculating the self-adaptive node distance and determining adjacent nodes, and searching the data sets of the next two generations of nodes by taking a source node as a starting point as an example:
2-1, with arbitrary source node xiAs a starting point, all data objects are traversed to determine the node with the closest distance as the initial generation node of the source node, namely xi→xi1And it is necessary to ensure that the source node has only one primary node.
2-2, calculating the distance | x between the source node and the initial generation nodei,xi1If the distance can be calculated by using a distance calculation formula (e.g. euclidean distance), the initial generation node includes three attributes (loca, value, | x |)i,xi1|), calculating adjacent two generations of Dist belonging to the same level, and screening next-generator with the current level at the next leveljNode as center, using Dist as neighborhood search radius to search last-generator of next leveliAnd continuously searching by the thought, other nodes formed except the source node comprise three attributes defined as: x is the number ofj(loca, value, Dist), where the Dist attribute value represents the set of distances between the current node and the descendant node of the same hierarchy.
2-3, taking the initial generation node as the starting point of the next level and taking the distance | xi,xi1Using the self-adaptive node spacing as the neighborhood search radius to search the next generation child node set of the initial generation node, wherein the search result is shown in fig. 3, the descendant node set of the initial generation node is not limited by the number of nodes, and the data objects in the neighborhood search radius all belong to the child nodes thereof, but obey the uniqueness principle, wherein the uniqueness principle means that only last-generator can be used between two adjacent generations in the same layeriSearch to generate next-generatorjThe mapping relationship can be one-to-one or one-to-many, but the data between two generations cannot have intersection, that is: last-generatori→next-generjAnd is
Figure BDA0003122033220000071
3. And searching layer by layer according to the searching strategy of 2-3, and finally forming the Shortest Forking Tree (SFT) and identifying the SFT as a group of rough clusters which comprise all node data sets and corresponding node distance sets of the source nodes and descendants thereof. Forming a rough cluster-the shortest bifurcation tree includes two types of data: one is a source node and a descendant node set searched by the source node; and the second is a distance set of corresponding nodes between all adjacent two generations forming the shortest bifurcation tree.
Since the outlier in the target dataset has the characteristics of low density and large distance of surrounding data objects in the neighborhood, the dispersion between the local outlier and the adjacent point is large. Assuming that the independent abnormal value is used as a source node, and the distance (namely, the self-adaptive node spacing) between different levels is used as a neighborhood searching radius, adjacent points of the independent abnormal value are gradually searched to form a complete tree structure and are identified as a rough category, so that the aim of dividing data into different clusters can be fulfilled.
S102: and pruning and combining the shortest bifurcation tree by adopting a threshold pruning algorithm based on a rough clustering neighborhood information system to obtain the simplified shortest bifurcation tree.
Please refer to fig. 4, which is a schematic diagram illustrating a process of processing a shortest bifurcation tree by using a threshold pruning algorithm of a rough clustering neighborhood information system in the data processing method according to the embodiment of the present invention.
Specifically, S102 may include: and combining the branches containing the shared nodes into a shortest branched tree structure according to the attribute of each data object in the shortest branched tree, and cutting off the completely intersected branches in the shortest branched tree to obtain the simplified shortest branched tree.
The method comprises the following specific steps:
1. extracting any shortest bifurcation tree formed in S101, and cutting off complete intersection branches from the shortest bifurcation tree, namely, supposing that two different branches T existiAnd TjAnd is|Ti|≥|TjAnd the pruning conditions are as follows:
Figure BDA0003122033220000072
at this time, T is cut offjRetaining only Ti
2. Shared node branch clustering: assuming the presence of two different branches T1And T2And | T1|≥|T2If the condition for realizing the cluster of the shared nodes is T2In which contains T1When two branches are merged into T1
In a specific example, after combining the branches including the shared node into a shortest bifurcated tree structure and cutting off the completely intersected branches in the shortest bifurcated tree to obtain a reduced shortest bifurcated tree, the method further includes: calculating the median and average of the sum of the distances of each data object in the shortest bifurcation tree according to the Dist attribute of each data object in the shortest bifurcation tree, and cutting off branches with the fraction less than or equal to a deviation threshold value according to a deviation threshold value formula, wherein the deviation threshold value formula is as follows: DEV is mean + mean-mean, DEV is a deviation threshold, mean is an average, mean is a median, and pruning branches with a score less than or equal to the deviation threshold according to a deviation threshold formula means: branches with data object Dist values less than or equal to the deviation threshold are pruned.
It should be understood that pruning branches with scores less than or equal to the deviation threshold according to the deviation threshold formula refers to pruning the weakly weighted branch cluster class with scores lower than the deviation threshold in the shortest bifurcation tree according to the deviation threshold formula.
By pruning the data objects and complete intersections in the shortest bifurcation tree of the rough clustering and combining the branches containing the shared nodes, the data structure of the shortest bifurcation tree can be further simplified, and the further processing of subsequent data is facilitated.
S103: and calculating the abnormal degree of the data object in the simplified shortest bifurcation tree by adopting an abnormal value detection algorithm of the balanced fusion data local multi-feature factors, and determining and removing the abnormal data value in the target data set according to the abnormal degree.
At one endIn a specific example, S103 may include: according to Ti-stand=Ti+|min(Ti) I, standardizing the data in the simplified shortest bifurcation tree; according to
Figure BDA0003122033220000081
Calculating the distance between each node in the same shortest bifurcation tree, wherein Ndis(x) Is the calculation result of the distance between each node of the shortest bifurcation tree, x is a specified data object, xiK represents the number of data objects in the shortest branched tree class, exp (1) represents a constant value with e as a base and an index of 1; calculating the variation coefficient of the data in the shortest bifurcation tree according to the following formulas respectively:
Figure BDA0003122033220000082
Figure BDA0003122033220000083
Figure BDA0003122033220000091
wherein T represents the sum of distances of all nodes in any shortest bifurcation tree cluster class, i represents an index label of the T, and xqRepresenting the distance of each node in the shortest bifurcation tree, k representing the number of nodes contained in the cluster class, j being the number of the shortest bifurcation tree, Nstd(T) standard deviation of class, Nmean(T) is the mean value of the class, Ncv(T) is the coefficient of variation; according to
Figure BDA0003122033220000092
Calculating local relative proximity of data objects in the class; calculating according to local relative proximity
Figure BDA0003122033220000093
Taking the MDILAF as the abnormal degree of the data object, and determining and eliminating abnormal data values in the target data set according to the abnormal degree, wherein N isxIs the shortest bifurcation tree of data object x, | N (x) | is the distance of all the rest of the data objects in the classAnd (4) summing.
Fig. 5 is a schematic flowchart of an abnormal value detection algorithm for uniformly fusing local multi-feature factors of data according to the data processing method provided by the embodiment of the present invention.
Because the abnormal value detection algorithm of the balanced fusion data local multi-feature factor introduces local relative closeness (LRP) to the standard local abnormal factor (LOF) to replace Local Reachable Density (LRD) of the data object, adjusts the calculation ratio of neighborhood discrete degree and distance into a calculation mode suitable for rough clustering, and introduces variation coefficient to characterize the discrete degree in the class, the abnormal degree of the data object can be accurately and quantitatively analyzed, and the judged abnormal object (namely the abnormal data value) is removed.
In a specific example, after S103, the method further includes: and adopting an improved sparse autoencoder to perform dimensionality reduction on the target data set, wherein the improved sparse autoencoder adopts a sparse rule operator to replace KL relative entropy as a sparsity constraint item, and adopts an L2 norm as a regular item.
In particular, the sparse self-encoder employs increasing neuron liveness at the hidden layer
Figure BDA0003122033220000094
Represents the degree of activation of the hidden neuron j from the encoded neural network given input X. The average activation of sparse autoencoder hidden neurons j is defined as:
Figure BDA0003122033220000095
where the index value j represents each neuron position tag.
The loss function of the original sparse self-encoder is generally expressed by mean square error, and KL divergence is increased on the basis of the mean square error as sparsity constraint, and the specific formula is as follows:
Figure BDA0003122033220000101
Figure BDA0003122033220000102
Figure BDA0003122033220000103
where beta is a functional penalty factor,
Figure BDA0003122033220000104
is a functional penalty term. The update mechanism of the sparse self-encoder is as follows:
Figure BDA0003122033220000105
wherein
Figure BDA0003122033220000106
The method for representing the derivative of the output layer z in the neural network, when the improved sparse self-encoder is adopted to perform dimensionality reduction on a target data set, specifically, the method may include: the following objective loss function is constructed from the improved sparse autoencoder:
Figure BDA0003122033220000107
wherein λ is1For sparse penalty term weight, λ2As a weighted attenuation coefficient, s2Representing the number of hidden layer neurons, W representing the weight coefficient of the neural network, b representing the bias term of the neural network, j representing the index of the neuron, in the range of [1, s2]J (W, b) represents the initial loss function term of the sparse autoencoder, Jspare(W, b) represents an objective loss function of the improved sparse self-encoder; and reducing the dimension of the target data set according to the target loss function.
Specifically, according to the constructed target loss function, the neural network parameter updating mechanism is changed into:
Figure BDA0003122033220000108
reference may be made to fig. 6, which is a schematic diagram illustrating a network mechanism of an improved sparse self-encoder of the data processing method according to the embodiment of the present invention.
Fig. 7 is a flowchart illustrating a data processing method according to an embodiment of the present invention.
By replacing KL relative entropy with a sparse rule operator as a sparsity constraint term, the coefficient performance of the algorithm can be improved; by adopting the L2 norm as a regular term, the polynomial component weight can be balanced, and the capability of preventing overfitting when the sparse self-encoder processes data is improved; meanwhile, the data dimension reduction is carried out on the data subjected to abnormal value detection by adopting the improved sparse self-encoder, so that the data redundancy can be reduced, and the simplicity and reliability of the data are improved.
The data processing method provided by the embodiment of the invention adopts the shortest bifurcation tree rough clustering algorithm to carry out rough clustering on a target data set to form a plurality of shortest bifurcation trees, then adopts the threshold pruning algorithm of the rough clustering neighborhood information system to prune and combine the shortest bifurcation trees, then utilizes the abnormal value detection algorithm of the balanced fusion data local multi-feature factors to calculate the abnormal degree of the data objects in the shortest bifurcation trees, and determines and eliminates the abnormal data values according to the abnormal degree of the data objects. Because the data of the target data set is automatically analyzed by adopting the algorithm, the data analysis efficiency can be improved; meanwhile, due to the abnormal value detection algorithm of the balanced fusion data local multi-feature factor, local relative proximity is introduced into the standard local abnormal factor to replace local reachable density of the data object, the calculation ratio of neighborhood dispersion degree and distance is adjusted to a calculation mode suitable for rough clustering, and variation coefficient representation intra-class dispersion degree is introduced, so that the abnormal degree of the data object can be accurately and quantitatively analyzed, abnormal data values in original data (namely a target data set) are determined and removed according to the abnormal degree, and the accuracy of analysis results and decision is improved.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the steps contain the same logical relationship, which is within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A second embodiment of the present invention relates to a data processing apparatus 200, as shown in fig. 8, including: the clustering module 201, the processing module 202 and the determining module 203, the functions of each module are described in detail as follows:
the clustering module 201 is configured to obtain a target data set, perform rough clustering on the target data set by using a shortest bifurcation tree rough clustering algorithm, and form a plurality of shortest bifurcation trees according to a rough clustering result;
the processing module 202 is configured to prune and combine the shortest bifurcate trees by using a threshold pruning algorithm based on a rough clustering neighborhood information system to obtain a simplified shortest bifurcate tree;
the determining module 203 is configured to calculate an abnormal degree of the data object in the reduced shortest bifurcation tree by using an abnormal value detection algorithm for the balanced fusion data local multi-feature factor, and determine and remove an abnormal data value in the target data set according to the abnormal degree.
Further, the data processing apparatus 200 provided by the embodiment of the present invention further includes a dimension reduction module, where the dimension reduction module is configured to: and adopting an improved sparse autoencoder to perform dimensionality reduction on the target data set, wherein the improved sparse autoencoder adopts a sparse rule operator to replace KL relative entropy as a sparsity constraint item, and adopts an L2 norm as a regular item.
Further, the dimension reduction module is further configured to:
the following objective loss function is constructed from the improved sparse autoencoder:
Figure BDA0003122033220000121
wherein, said λ1For sparse penalty term weights, said λ2As a weighted attenuation coefficient, s2Representing the number of hidden layer neurons, W represents the neural network weight coefficientB represents the neural network bias term, J represents the index of the neuron, J (W, b) represents the initial loss function term of the sparse self-encoder, Jspare(W, b) represents an objective loss function of the improved sparse self-encoder;
and reducing the dimension of the target data set according to the target loss function.
Further, the determining module 203 is specifically configured to:
according to Ti-stand=Ti+|min(Ti) Standardizing the data in the simplified shortest bifurcation tree;
according to
Figure BDA0003122033220000122
Calculating the distance between each node in the same shortest bifurcation tree, wherein Ndis(x) Is the calculation result of the distance between each node of the shortest bifurcation tree, x is a specified data object, xiK represents the number of data objects in the shortest forked tree class for other data objects in the shortest forked tree class, exp (1) represents a constant value with e as a base and an index of 1;
calculating the variation coefficient of the data in the shortest bifurcation tree according to the following formula respectively:
Figure BDA0003122033220000123
Figure BDA0003122033220000124
Figure BDA0003122033220000125
wherein, the T represents the sum of the distances of all nodes in any shortest bifurcation tree cluster class, i represents the index number of the T, and xqRepresenting the distance of each node in the shortest bifurcation tree, k representing the number of nodes contained in the cluster class, NstdOf the class (T)Standard deviation, j denotes the number of shortest bifurcation trees, Nmean(T) is the mean value of the class, Ncv(T) is the coefficient of variation;
according to
Figure BDA0003122033220000131
Calculating local relative proximity of data objects in the class;
calculating according to local relative proximity
Figure BDA0003122033220000132
Taking the MDILAF as the abnormal degree of a data object, and determining and eliminating abnormal data values in the target data set according to the abnormal degree, wherein N isxThe shortest branch tree class of data object x, where | n (x) | is the sum of the distances of all the remaining data objects in the class.
Further, the clustering module 201 is specifically configured to:
determining a source node in the target dataset;
searching the nearest node of the source node, and taking the nearest node of the source node as a primary node;
starting with the initial generation node as a current father node, circularly executing searching of a descendant node set by taking the current father node as a starting point and taking an adaptive node spacing as a neighborhood searching radius, if a new node exists in the neighborhood searching radius, taking the new node as the current father node, continuing to search the descendant node set by taking the adaptive node spacing as the neighborhood searching radius until no new node exists in the neighborhood searching radius, ending the searching and storing all nodes and node distances from the source node to the last generation node, and forming a shortest bifurcation tree according to all nodes from the source node to the last generation node, wherein the node distances are the distance sets of the nodes at the same level and the descendant node, and the adaptive node spacing is as follows: dist ═ arg min (last-gene)i,next-generj) The last-generator described in (1)iBeing a parent of two adjacent generations, said next-generatorjAre the offspring in two adjacent generations.
Further, the processing module 202 is specifically configured to:
and combining the branches containing the shared nodes into a shortest branched tree structure according to the attribute of each data object in the shortest branched tree, and cutting off the completely intersected branches in the shortest branched tree to obtain the simplified shortest branched tree.
Further, the processing module 202 is further configured to:
according to the Dist attribute of each data object in the shortest bifurcation tree, calculating the median and average number of the sum of the distances of each data object in the shortest bifurcation tree, and cutting off branches with the fraction smaller than or equal to a deviation threshold value according to a deviation threshold value formula, wherein the deviation threshold value formula is as follows: DEV is mean + | mean-mean |, the DEV is the deviation threshold, the mean is the average, and the mean is the median.
It should be understood that this embodiment is a device embodiment corresponding to the first embodiment, and that this embodiment can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
A third embodiment of the present invention relates to a network device, as shown in fig. 9, including at least one processor 301; and a memory 302 communicatively coupled to the at least one processor 301; the memory 302 stores instructions executable by the at least one processor 301, and the instructions are executed by the at least one processor 301, so that the at least one processor 301 can execute the data processing method.
Where the memory 302 and the processor 301 are coupled in a bus, the bus may comprise any number of interconnected buses and bridges, the buses coupling one or more of the various circuits of the processor 301 and the memory 302. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 301 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 301.
The processor 301 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 302 may be used to store data used by processor 301 in performing operations.
A fourth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
That is, those skilled in the art can understand that all or part of the steps in the method for implementing the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific embodiments for practicing the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (10)

1. A data processing method, comprising:
acquiring a target data set, carrying out rough clustering on the target data set by adopting a shortest bifurcation tree rough clustering algorithm, and forming a plurality of shortest bifurcation trees according to a rough clustering result;
pruning and combining the shortest bifurcation tree by adopting a threshold pruning algorithm based on a rough clustering neighborhood information system to obtain a simplified shortest bifurcation tree;
and calculating the abnormal degree of the data object in the simplified shortest bifurcation tree by adopting an abnormal value detection algorithm of the balanced fusion data local multi-feature factor, and determining and removing the abnormal data value in the target data set according to the abnormal degree.
2. The data processing method according to claim 1, wherein after the computing the degree of abnormality of the data object in the reduced shortest bifurcation tree by the abnormal value detection algorithm using the balanced fusion data local multi-feature factor, and determining and eliminating the abnormal data value in the target data set according to the degree of abnormality of the data object, the method further comprises:
and adopting an improved sparse autoencoder to perform dimensionality reduction on the target data set, wherein the improved sparse autoencoder adopts a sparse rule operator to replace KL relative entropy as a sparsity constraint item, and adopts an L2 norm as a regular item.
3. The data processing method of claim 2, wherein the dimensionality reduction of the target data set using the improved sparse auto-encoder comprises:
the following objective loss function is constructed from the improved sparse autoencoder:
Figure FDA0003122033210000011
wherein, said λ1For sparse penalty term weights, said λ2As a weighted attenuation coefficient, s2Representing the number of hidden layer neurons, W representing the weight coefficient of the neural network, b representing the bias term of the neural network, J representing the index of the neuron, J (W, b) representing the initial loss function term of the sparse self-encoder, Jspare(W, b) represents an objective loss function of the improved sparse self-encoder;
and reducing the dimension of the target data set according to the target loss function.
4. The data processing method according to claim 1, wherein the calculating the degree of abnormality of the data objects in the reduced shortest bifurcation tree by using an abnormal value detection algorithm for equalizing and fusing local multi-feature factors of data, and determining and removing abnormal data values in the target data set according to the degree of abnormality of the data objects comprises:
according to Ti-stand=Ti+|min(Ti) Standardizing the data in the simplified shortest bifurcation tree;
according to
Figure FDA0003122033210000021
Calculating the distance between each node in the same shortest bifurcation tree, wherein Ndis(x) Is the calculation result of the distance between each node of the shortest bifurcation tree, x is a specified data object, xiK represents the number of data objects in the shortest forked tree class for other data objects in the shortest forked tree class, exp (1) represents a constant value with e as a base and an index of 1;
calculating the variation coefficient of the data in the shortest bifurcation tree according to the following formula respectively:
Figure FDA0003122033210000022
Figure FDA0003122033210000023
Figure FDA0003122033210000024
wherein, the T represents the sum of the distances of all nodes in any shortest bifurcation tree cluster class, i represents the index number of the T, and xqRepresenting the distance of each node in the shortest bifurcation tree, k representing the number of nodes contained in the cluster class, Nstd(T) is the standard deviation of the class, j represents the number of shortest bifurcation trees, and N ismean(T) is the average of the classes, said Ncv(T) is the coefficient of variation;
according to
Figure FDA0003122033210000025
Calculating local relative proximity of data objects in the class;
calculating according to local relative proximity
Figure FDA0003122033210000026
Taking the MDILAF as the abnormal degree of a data object, and determining and eliminating abnormal data values in the target data set according to the abnormal degree, wherein N isxIs the shortest branching tree of data object x, and the | n (x) | is the sum of the distances of all the remaining data objects in the class.
5. The data processing method of claim 1, wherein the coarse clustering algorithm using the shortest spanning tree is used to perform coarse clustering on the target data set, and a plurality of shortest spanning trees are formed according to the coarse clustering result, including:
determining a source node in the target dataset;
searching the nearest node of the source node, and taking the nearest node of the source node as a primary node;
starting with the initial generation node as a current father node, circularly executing searching of a descendant node set by taking the current father node as a starting point and taking an adaptive node spacing as a neighborhood searching radius, if a new node exists in the neighborhood searching radius, taking the new node as the current father node, continuing to search the descendant node set by taking the adaptive node spacing as the neighborhood searching radius until no new node exists in the neighborhood searching radius, ending the searching and storing all nodes and node distances from the source node to the last generation node, and forming a shortest bifurcation tree according to all nodes from the source node to the last generation node, wherein the node distances are the distance sets of the nodes at the same level and the descendant node, and the adaptive node spacing is as follows: dist ═ arg min (last-gene)i,next-generj) The last-generator described in (1)iBeing a parent of two adjacent generations, said next-generatorjAre the offspring in two adjacent generations.
6. The data processing method according to any one of claims 1 to 5, wherein the pruning and merging the shortest bifurcation tree by using a threshold pruning algorithm based on a rough clustering neighborhood information system to obtain a reduced shortest bifurcation tree comprises:
and combining the branches containing the shared nodes into a shortest branched tree structure according to the attribute of each data object in the shortest branched tree, and cutting off the completely intersected branches in the shortest branched tree to obtain the simplified shortest branched tree.
7. The data processing method of claim 4, wherein after combining the branches including the shared node into a shortest bifurcated tree structure and pruning completely intersected branches in the shortest bifurcated tree, the method further comprises:
according to the Dist attribute of each data object in the shortest bifurcation tree, calculating the median and average number of the sum of the distances of each data object in the shortest bifurcation tree, and cutting off branches with the fraction smaller than or equal to a deviation threshold value according to a deviation threshold value formula, wherein the deviation threshold value formula is as follows: DEV is mean + | mean-mean |, the DEV is the deviation threshold, the mean is the average, and the mean is the median.
8. A data processing apparatus, comprising:
the clustering module is used for acquiring a target data set, carrying out rough clustering on the target data set by adopting a shortest bifurcation tree rough clustering algorithm, and forming a plurality of shortest bifurcation trees according to a rough clustering result;
the processing module is used for pruning and combining the shortest bifurcation tree by adopting a threshold pruning algorithm based on a rough clustering neighborhood information system to obtain a simplified shortest bifurcation tree;
and the determining module is used for calculating the abnormal degree of the data object in the simplified shortest bifurcation tree by adopting an abnormal value detection algorithm of the balanced fusion data local multi-feature factor, and determining and removing the abnormal data value in the target data set according to the abnormal degree.
9. A network device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 7.
CN202110678862.XA 2021-06-18 2021-06-18 Data processing method, device, network equipment and storage medium Pending CN113420804A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110678862.XA CN113420804A (en) 2021-06-18 2021-06-18 Data processing method, device, network equipment and storage medium
PCT/CN2022/099638 WO2022262869A1 (en) 2021-06-18 2022-06-17 Data processing method and apparatus, network device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110678862.XA CN113420804A (en) 2021-06-18 2021-06-18 Data processing method, device, network equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113420804A true CN113420804A (en) 2021-09-21

Family

ID=77789079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110678862.XA Pending CN113420804A (en) 2021-06-18 2021-06-18 Data processing method, device, network equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113420804A (en)
WO (1) WO2022262869A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114742178A (en) * 2022-06-10 2022-07-12 航天亮丽电气有限责任公司 Method for non-invasive pressure plate state monitoring through MEMS six-axis sensor
CN115202661A (en) * 2022-09-15 2022-10-18 深圳大学 Hybrid generation method with hierarchical structure layout and related equipment
WO2022262869A1 (en) * 2021-06-18 2022-12-22 工业互联网创新中心(上海)有限公司 Data processing method and apparatus, network device, and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272216B (en) * 2023-11-22 2024-02-09 中国建材检验认证集团湖南有限公司 Data analysis method for automatic flow monitoring station and manual water gauge observation station
CN117370331B (en) * 2023-12-08 2024-02-20 河北建投水务投资有限公司 Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061213A1 (en) * 2001-07-31 2003-03-27 International Business Machines Corporation Method for building space-splitting decision tree
CN111985837A (en) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 Risk analysis method, device and equipment based on hierarchical clustering and storage medium
CN112800148A (en) * 2021-02-04 2021-05-14 国网福建省电力有限公司 Scattered pollutant enterprise research and judgment method based on clustering feature tree and outlier quantization

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444247B (en) * 2020-06-17 2023-10-17 北京必示科技有限公司 Root cause positioning method, root cause positioning device and storage medium based on KPI (key performance indicator)
CN113420804A (en) * 2021-06-18 2021-09-21 工业互联网创新中心(上海)有限公司 Data processing method, device, network equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061213A1 (en) * 2001-07-31 2003-03-27 International Business Machines Corporation Method for building space-splitting decision tree
CN111985837A (en) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 Risk analysis method, device and equipment based on hierarchical clustering and storage medium
CN112800148A (en) * 2021-02-04 2021-05-14 国网福建省电力有限公司 Scattered pollutant enterprise research and judgment method based on clustering feature tree and outlier quantization

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022262869A1 (en) * 2021-06-18 2022-12-22 工业互联网创新中心(上海)有限公司 Data processing method and apparatus, network device, and storage medium
CN114742178A (en) * 2022-06-10 2022-07-12 航天亮丽电气有限责任公司 Method for non-invasive pressure plate state monitoring through MEMS six-axis sensor
CN114742178B (en) * 2022-06-10 2022-11-08 航天亮丽电气有限责任公司 Method for non-invasive pressure plate state monitoring through MEMS six-axis sensor
CN115202661A (en) * 2022-09-15 2022-10-18 深圳大学 Hybrid generation method with hierarchical structure layout and related equipment
CN115202661B (en) * 2022-09-15 2022-11-29 深圳大学 Hybrid generation method with hierarchical structure layout and related equipment

Also Published As

Publication number Publication date
WO2022262869A1 (en) 2022-12-22

Similar Documents

Publication Publication Date Title
CN113420804A (en) Data processing method, device, network equipment and storage medium
US20210049512A1 (en) Explainers for machine learning classifiers
CN112529153B (en) BERT model fine tuning method and device based on convolutional neural network
CN110428137B (en) Updating method and device of risk prevention and control strategy
WO2020228378A1 (en) Method and device for determining database configuration parameters
US10592634B1 (en) Systems and methods for automatic handling of engineering design parameter violations
CN112765477A (en) Information processing method and device, information recommendation method and device, electronic equipment and storage medium
CN110310114A (en) Object classification method, device, server and storage medium
Mitra et al. Feature selection using structural similarity
Perez-Godoy et al. CO 2 RBFN: an evolutionary cooperative–competitive RBFN design algorithm for classification problems
Anderson et al. The rankability of data
WO2008156595A1 (en) Hybrid method for simulation optimization
KR20210066545A (en) Electronic device, method, and computer readable medium for simulation of semiconductor device
Yan et al. A clustering algorithm for multi-modal heterogeneous big data with abnormal data
CN113642727B (en) Training method of neural network model and processing method and device of multimedia information
CN114185761A (en) Log collection method, device and equipment
Tembine Mean field stochastic games: Convergence, Q/H-learning and optimality
CN113204642A (en) Text clustering method and device, storage medium and electronic equipment
US11315036B2 (en) Prediction for time series data using a space partitioning data structure
US20150134307A1 (en) Creating understandable models for numerous modeling tasks
CN110084376B (en) Method and device for automatically separating data into boxes
Stotz et al. Incremental graph matching for situation awareness
US11275816B2 (en) Selection of Pauli strings for Variational Quantum Eigensolver
CN115412401B (en) Method and device for training virtual network embedding model and virtual network embedding
CN110995384A (en) Broadcast master control fault trend prejudging method based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination