CN112115304A - Partial order data processing method, device and system and storage medium - Google Patents

Partial order data processing method, device and system and storage medium Download PDF

Info

Publication number
CN112115304A
CN112115304A CN201910538902.3A CN201910538902A CN112115304A CN 112115304 A CN112115304 A CN 112115304A CN 201910538902 A CN201910538902 A CN 201910538902A CN 112115304 A CN112115304 A CN 112115304A
Authority
CN
China
Prior art keywords
data
partial order
processing
directed graph
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910538902.3A
Other languages
Chinese (zh)
Inventor
陈冠霖
李世雷
王轶凡
张钋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu China Co Ltd
Original Assignee
Baidu China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu China Co Ltd filed Critical Baidu China Co Ltd
Priority to CN201910538902.3A priority Critical patent/CN112115304A/en
Publication of CN112115304A publication Critical patent/CN112115304A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a device and a system for processing partial order data and a storage medium, wherein the method comprises the following steps: acquiring a partial order relation directed graph corresponding to a data set; cutting the partial order relationship directed graph into mutually independent sub-graphs; and if the sub-graph has the annular structure corresponding to the noise data, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise. The invention can convert the noise detection problem of the partial order data into the annular structure detection problem of the partial order relational graph so as to avoid a plurality of invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirement of mass data.

Description

Partial order data processing method, device and system and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a system, and a storage medium for processing partial order data.
Background
With the development of internet technology, the processing amount of data is also larger and larger. Many databases are related to data in a partial order relationship, that is, at least a part of data in a data set has a priority relationship in order. When new data is continuously introduced, data with conflicting partial order relationships (also called noise data) may be introduced into the database.
At present, the noise data processing methods mainly include: (1) noise data is ignored, and no processing is carried out; (2) filtering out noise data in a manual screening mode; (3) and optionally selecting one node in the data chain as a starting node, traversing the whole data chain from the starting node, then obtaining the partial order relation of the data chain, carrying out conflict judgment on the partial order relation of the data chain, correcting all conflict data, and then selecting a new data chain to carry out similar processing until the detection of all the data chains is completed.
However, in scenarios where data requirements are high, noisy data generally cannot be ignored; the data computation amount of the existing noise data processing mode is very large, the processing efficiency is low, and the requirement of massive data processing is difficult to meet.
Disclosure of Invention
The invention provides a method, a device and a system for processing partial order data and a storage medium, which can convert the noise detection problem of the partial order data into the annular structure detection problem of a partial order relational graph so as to avoid a plurality of invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirements of mass data.
In a first aspect, an embodiment of the present invention provides a method for processing partial order data, including:
acquiring a partial order relation directed graph corresponding to a data set;
cutting the partial order relationship directed graph into mutually independent sub-graphs;
and if the sub-graph has the annular structure corresponding to the noise data, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise.
In one possible design, obtaining a partial order relationship directed graph corresponding to a data set includes:
constructing a partial order relation directed graph according to the partial order relation among the data in the data set; wherein, the nodes in the partial ordering relationship directed graph correspond to the data identifiers in the data set; and the edges connecting the nodes in the partial order relationship directed graph are used for representing the partial order relationship among the nodes.
In one possible design, the partial order relationship directed graph is cut into mutually independent subgraphs, including:
according to the query, cutting the partial order relationship directed graph into mutually independent sub-graphs; wherein each query corresponds to a subgraph.
In one possible design, further comprising:
and judging whether a ring structure corresponding to the noise data exists on the subgraph.
In one possible design, the determining whether a ring structure corresponding to noise data exists on the sub-graph includes:
determining whether a ring structure exists on the sub-graph by traversing nodes on the sub-graph; or acquiring an adjacent table of the subgraph, and determining whether a ring structure exists on the subgraph according to the connection relation of nodes in the adjacent table.
In one possible design, the subtracting the edges of the ring structure according to a preset strategy to obtain the partial order data with noise removed includes:
traversing the annular structure in the subgraph in a distributed processing mode; the traversal processing means that other nodes of the annular structure are sequentially traversed by taking any node in the annular structure as a starting point to obtain all edges of the annular structure and a weight value of each edge;
and cutting at least one edge in the ring structures of all subgraphs in the order from small to large according to the weight until no ring structure exists in the subgraphs.
In one possible design, further comprising: and if the sides with the same weight exist in the annular structure, performing side pruning according to the principle of the minimum pruning times.
In a second aspect, an embodiment of the present invention provides an apparatus for processing partial order data, including:
the acquisition module is used for acquiring a partial order relation directed graph corresponding to the data set;
the segmentation module is used for segmenting the partial order relation directed graph into mutually independent sub-graphs;
and the processing module is used for performing subtraction processing on the side of the annular structure according to a preset strategy when the annular structure corresponding to the noise data exists on the subgraph to obtain the partial order data for eliminating the noise.
In one possible design, the obtaining module is specifically configured to:
constructing a partial order relation directed graph according to the partial order relation among the data in the data set; wherein, the nodes in the partial ordering relationship directed graph correspond to the data identifiers in the data set; and the edges connecting the nodes in the partial order relationship directed graph are used for representing the partial order relationship among the nodes.
In one possible design, the slicing module is specifically configured to:
according to the query, cutting the partial order relationship directed graph into mutually independent sub-graphs; wherein each query corresponds to a subgraph.
In one possible design, further comprising:
and the judging module is used for judging whether the annular structure corresponding to the noise data exists on the subgraph.
In a possible design, the determining module is specifically configured to:
determining whether a ring structure exists on the sub-graph by traversing nodes on the sub-graph; or acquiring an adjacent table of the subgraph, and determining whether a ring structure exists on the subgraph according to the connection relation of nodes in the adjacent table.
In one possible design, the processing module is specifically configured to:
traversing the annular structure in the subgraph in a distributed processing mode; the traversal processing means that other nodes of the annular structure are sequentially traversed by taking any node in the annular structure as a starting point to obtain all edges of the annular structure and a weight value of each edge;
and cutting at least one edge in the ring structures of all subgraphs in the order from small to large according to the weight until no ring structure exists in the subgraphs.
In one possible design, the processing module is further configured to: and if the sides with the same weight exist in the annular structure, performing side pruning according to the principle of the minimum pruning times.
In a third aspect, an embodiment of the present invention provides a system for processing partial order data, including: the device comprises a memory and a processor, wherein the memory stores executable instructions of the processor; wherein the processor is configured to perform the method of processing partially ordered data of any of the first aspect via execution of the executable instructions.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for processing partial order data according to any one of the first aspect
In a fifth aspect, an embodiment of the present invention provides a program product, where the program product includes: a computer program stored in a readable storage medium, the computer program being readable from the readable storage medium by at least one processor of a server, execution of the computer program by the at least one processor causing the server to perform the method of processing partially ordered data as set forth in any of the first aspects
The invention provides a method, a device and a system for processing partial order data and a storage medium, wherein a partial order relation directed graph corresponding to a data set is obtained; cutting the partial order relationship directed graph into mutually independent sub-graphs; and if the sub-graph has the annular structure corresponding to the noise data, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise. The invention can convert the noise detection problem of the partial order data into the annular structure detection problem of the partial order relational graph so as to avoid a plurality of invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirement of mass data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of an application scenario of the present invention;
fig. 2 is a flowchart of a method for processing partial order data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a directed graph of partial order relationships according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of noise cancellation provided by an embodiment of the present invention;
FIG. 5 is a flowchart of a method for processing partial order data according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of a partial order data processing apparatus according to a third embodiment of the present invention;
fig. 7 is a schematic structural diagram of a partial order data processing apparatus according to a fourth embodiment of the present invention;
fig. 8 is a schematic structural diagram of a partial order data processing system according to a fifth embodiment of the present invention.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
In the following, some terms in the present application are explained to facilitate understanding by those skilled in the art:
the partial order set is a set equipped with partial ordering relations in the mathematical order theory. This theory abstracts the intuitive notion of ordering, sequencing, or arranging the elements of this collection. Such ordering does not necessarily guarantee mutual comparability of all objects within the set, and if only partially ordered, the set defines a partial ordering topology.
Definition 1: let S, T be a set, let S x T { (S, T) | S ∈ S, T ∈ T } refer to Sx T as the Cartesian product of S and T.
Definition 2: a subset R of the Cartesian product S x S is called a relationship in S, let (S1, S2) be S x S, R be a relationship in S, if (S1, S2) be R, then S1 is called to have a relationship R with S2 (S1 Rs 2); otherwise s1 is said to have no relation R with s 2.
Definition 3: the relation ">" in the set S is called strict partial order relation, and means that the following 3 conditions are satisfied for any a, b, c ∈ S:
self-reflexibility:
Figure BDA0002101943420000061
no higher or less than a;
antisymmetry:
Figure BDA0002101943420000062
a is less than or equal to b and b is less than or equal to a, then a is b;
transferability:
Figure BDA0002101943420000063
a is less than or equal to b and b is less than or equal to c, then a is less than or equal to c;
the set S with the partial ordering relation is called a partial ordering set, and is recorded as follows: (S, >)
Definition 4: let A be a subset of the partially ordered set (S, >), and if there must be a > b or b > a for any a, b ∈ A, then A is said to be a chain of S.
Assuming that the partial order data set (S, >) { a > B, B > C, C > a }, a > C can be derived according to the transitivity of the strict self-reverse partial order relationship, and since C > a already exists in the existing set, in an actual application scenario, the situations such as: { (A > B, B > C) - > A > C, C > A } such conflicting noisy data greatly affects the accuracy of the data, and such noise is more difficult to find as the number of passes and the number of sets grow.
With the development of internet technology, the processing amount of data is also larger and larger. Many databases are related to data in a partial order relationship, that is, at least a part of data in a data set has a priority relationship in order. When new data is continuously introduced, data with conflicting partial order relationships (also called noise data) may be introduced into the database. At present, the noise data processing methods mainly include: (1) noise data is ignored, and no processing is carried out; (2) filtering out noise data in a manual screening mode; (3) and optionally selecting one node in the data chain as a starting node, traversing the whole data chain from the starting node, then obtaining the partial order relation of the data chain, carrying out conflict judgment on the partial order relation of the data chain, correcting all conflict data, and then selecting a new data chain to carry out similar processing until the detection of all the data chains is completed. However, in scenarios where data requirements are high, noisy data generally cannot be ignored; the data computation amount of the existing noise data processing mode is very large, the processing efficiency is low, and the requirement of massive data processing is difficult to meet.
In view of the above technical problems, the present invention provides a method, an apparatus, a system and a storage medium for processing partial order data, which can convert the noise detection problem of the partial order data into the detection problem of the ring structure of the partial order relationship graph, so as to avoid many invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirements of mass data. The method provided by the invention can be applied to the fields of partial order relation data preprocessing, automatic evaluation of search engine results, a search engine result partial order knowledge base, a search engine genetic evolution model and the like. Fig. 1 is a schematic diagram of an application scenario of the present invention, and as shown in fig. 1, the device for processing partial order data provided by the present invention includes: the device comprises an acquisition module, a cutting module and a processing module. Firstly, the acquisition module is used for acquiring a partial order relation directed graph corresponding to a data set. In the implementation process, according to the partial order relationship among the data in the data set, a partial order relationship directed graph is constructed; wherein, the nodes in the partial order relationship directed graph correspond to the data identifiers in the data set; the edges connecting the nodes in the partial order relationship directed graph are used for representing the partial order relationship between the nodes. Thus, the problem of detecting noisy data is translated into a problem of detecting a graph-loop structure. And based on the characteristics of the ring structure, the detection of the ring can be further simplified into the detection of repeated precursor nodes and subsequent nodes, and the detection is directly carried out through an adjacency list even without traversing, so that the detection efficiency of the noise is greatly improved. And then, a cutting module for cutting the partial order relation directed graph into mutually independent sub-graphs. In the implementation process, according to the query, the partial order relationship directed graph is cut into mutually independent sub-graphs; wherein each query corresponds to a subgraph. In order to solve the problem of detection and processing efficiency of noise data in a massive and complex partial order relation set, a distributed loop processing method of circular traversal can be adopted: firstly, defining and abstracting a data structure of noise, then utilizing the characteristic that Q (query) data can segment independent subgraphs, and segmenting corresponding partial sequence data under each Q into independent subsets. And finally, the processing module is used for carrying out subtraction processing on the side of the annular structure according to a preset strategy when the annular structure corresponding to the noise data exists on the subgraph, so as to obtain the partial order data for eliminating the noise. In the implementation process, traversing the ring structure in the subgraph by adopting a distributed processing mode; the traversal processing means that any node in the annular structure is taken as a starting point, other nodes of the annular structure are sequentially traversed, and all edges of the annular structure and the weight value of each edge are obtained; and cutting at least one edge in the ring structures of all subgraphs in the order from small to large according to the weight until no ring structures exist in the subgraphs. For example, a hadoop/MapReduce distributed computing framework can be used to process on a plurality of computing nodes simultaneously in parallel, thereby greatly improving the processing efficiency of mass data. In each independent subgraph traversal, different from the sequential processing, the traversal is performed by determining the starting point of the detected ring structure instead of the selected starting point of random each time, so that a plurality of invalid traversal operations can be avoided, and the processing efficiency is improved. Alternatively, if there are sides having the same weight in the ring structure, the sides are clipped according to the principle of the smallest number of times of clipping.
By applying the method, the noise detection problem of the partial order data can be converted into the annular structure detection problem of the partial order relational graph, so that a plurality of invalid traversal operations are avoided, the data processing efficiency is effectively improved, and the processing requirement of mass data is met.
The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 2 is a flowchart of a method for processing partial order data according to an embodiment of the present invention, as shown in fig. 2, the method in this embodiment may include:
s101, obtaining a partial order relation directed graph corresponding to the data set.
In the embodiment, a partial order relation directed graph is constructed according to the partial order relation among data in a data set; wherein, the nodes in the partial order relationship directed graph correspond to the data identifiers in the data set; the edges connecting the nodes in the partial order relationship directed graph are used for representing the partial order relationship between the nodes.
Specifically, in the data set S, there are data u1, u2, u3, u4, u5, u6, u7, u 8. Fig. 3 is a schematic diagram of a partial order relationship directed graph provided in an embodiment of the present invention, and the partial order relationship directed graph is constructed according to a partial order relationship between data in a data set S, as shown in fig. 3, nodes in the partial order relationship directed graph correspond to data identifiers in the data set, and edges connecting the nodes in the partial order relationship directed graph are used to represent the partial order relationship between the nodes. The partial order relationship directed graph has two links, namely a link u1- > u2- > u3- > u4- > u5- > u6- > u7- > u3 and a link u1- > u2- > u3- > u4- > u 8. The link u1- > u2- > u3- > u4- > u8 is a normal link, and noise does not exist. And a "loop" detected on a daughter chain link u1- > u2- > u3- > u4- > u5- > u6- > u7- > u3 confirms that a noise point on the chain is found. Thus, the problem of detecting noisy data is translated into a problem of detecting a graph-loop structure. And based on the characteristics of the ring structure, the detection of the ring can be further simplified into the detection of repeated precursor nodes and subsequent nodes, and the detection is directly carried out through an adjacency list even without traversing, so that the detection efficiency of the noise is greatly improved.
And S102, segmenting the partial order relationship directed graph into mutually independent sub-graphs.
In the embodiment, according to the query, the partial order relationship directed graph is cut into mutually independent sub-graphs; wherein each query corresponds to a subgraph.
Specifically, in order to solve the problem of efficiency of detection and processing of noise data in a massive and complex partial order relationship set, a distributed loop processing method of loop traversal may be adopted: firstly, defining and abstracting a data structure of noise, then utilizing the characteristic that Q (query) data can segment independent subgraphs, and segmenting corresponding partial sequence data under each Q into independent subsets. For example, queries Q1 and Q2 are constructed, resulting in the corresponding partial order data under Q1 being split into independent subsets split0 ═ { Q1: { U11> U12, U12> U13, … }, … }, and the corresponding partial order data under Q2 being split into independent subsets, split1 ═ Q2: { U21> U22, U22> U23, … }, … }. Q is the meaning of a query, which refers to the message sent by a search engine or database to construct a query statement to look for a particular file, web site, record, or series of records from the database. Then, a hadoop/MapReduce distributed computing framework can be utilized to process on a plurality of computing nodes simultaneously in parallel, so that the processing efficiency of mass data is greatly improved.
And S103, if the annular structure corresponding to the noise data exists on the subgraph, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise.
In this embodiment, the subtracting the edge of the ring structure according to a preset policy to obtain the partial order data for eliminating noise includes: traversing the ring structure in the subgraph in a distributed processing mode; the traversal processing means that any node in the annular structure is taken as a starting point, other nodes of the annular structure are sequentially traversed, and all edges of the annular structure and the weight value of each edge are obtained; and cutting at least one edge in the ring structures of all subgraphs in the order from small to large according to the weight until no ring structures exist in the subgraphs.
Specifically, a hadoop/MapReduce distributed computing framework can be utilized to process on a plurality of computing nodes simultaneously in parallel, so that the processing efficiency of mass data is greatly improved. In each independent subgraph traversal, different from the sequential processing, the traversal is performed by determining the starting point of the detected ring structure instead of the selected starting point of random each time, so that a plurality of invalid traversal operations can be avoided, and the processing efficiency is improved. Fig. 4 is a schematic diagram of noise elimination according to an embodiment of the present invention, and as shown in fig. 4, in the above step, according to the query, the partial order relationship directed graph is cut into mutually independent subgraphs. The sub-graph corresponding to the query q1 has a ring structure corresponding to the noise data, so that the sub-graph corresponding to the query q1 needs to be denoised. There are two strands in this subgraph, for strand S1: u2> u3, u3> u4 and u4> u2 and chain S2: u2> u1, u1> u3, known to form a ring structure: loop1 ═ u2- > u1- > u3- > u2 and loop2 ═ u2- > u3- > u4- > u 2. Pruning processing is required, the ring structure is broken, and the data quality is improved. Generally, step 1, firstly, a directed graph is segmented according to query granularity, and each query corresponds to an independent subgraph; step 2, adopting an adjacency list for storage, and step 3, traversing each sub-graph with distributed depth first and detecting all edges on a loop; step 4-preferentially prune the least weighted edge on the loop: 2- >3 and 1- >3, when the weights of a plurality of edges are the same, the edge with the minimum cost is removed, and the operation with high efficiency as much as possible is ensured; step 5-loop process until all loops are broken. Therefore, in loop2 ═ u2- > u3- > u4- > u2, the weight W of u2- > u3 is the smallest, and the edge between u2 and u3 is cut off, so that new loop3 ═ u2- > u1- > u3- > u4- > u2 is obtained. In the loop3 ═ u2- > u1- > u3- > u4- > u2, the weight W of u1- > u3 is the minimum, and the edge between u1 and u3 is cut out, so that a new graph u3- > u4- > u2- > u1 is obtained, and no cyclic structure exists, and noise cancellation ends.
Alternatively, if there are sides having the same weight in the ring structure, the sides are clipped according to the principle of the smallest number of times of clipping.
Specifically, if a plurality of edges have the same weight, the edge with the minimum cost is subtracted, that is, the edge is pruned according to the principle of the minimum pruning frequency, so as to ensure the most efficient operation.
In this embodiment, a directed graph of partial order relationship corresponding to a data set is obtained; cutting the partial order relation directed graph into mutually independent sub-graphs; and if the annular structure corresponding to the noise data exists on the subgraph, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise. The invention can convert the noise detection problem of the partial order data into the annular structure detection problem of the partial order relational graph so as to avoid a plurality of invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirement of mass data.
Fig. 5 is a flowchart of a method for processing partial order data according to a second embodiment of the present invention, and as shown in fig. 5, the method in this embodiment may include:
s201, obtaining a partial order relation directed graph corresponding to the data set.
S202, segmenting the partial order relationship directed graph into mutually independent sub-graphs.
S203, judging whether the subgraph has a ring structure corresponding to the noise data.
In this embodiment, traversing nodes on a subgraph can be used to determine whether a ring structure exists on the subgraph; or acquiring an adjacent table of the subgraph, and determining whether a ring structure exists on the subgraph according to the connection relation of nodes in the adjacent table.
Specifically, the detection of the ring can be further simplified into the detection of repeated precursor and subsequent nodes based on the characteristics of the ring structure, and the direct detection through the adjacency list even without traversal is realized, so that the detection efficiency of the noise is greatly improved. For example, as shown in the partial ordering relationship directed graph in FIG. 3, there are links u1- > u2- > u3- > u4- > u5- > u6- > u7- > u3, and according to the connection relationship of nodes in the adjacency list, u3- > u4- > u5- > u6- > u7- > u3 can be seen to form a ring.
And S204, if the annular structure corresponding to the noise data exists on the sub-graph, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise.
In this embodiment, please refer to the related description in step S101 to step S103 in the method shown in fig. 2 for the specific implementation process and technical principle of step S201 to step S202 and step S204, which is not described herein again.
In this embodiment, a directed graph of partial order relationship corresponding to a data set is obtained; cutting the partial order relation directed graph into mutually independent sub-graphs; and if the annular structure corresponding to the noise data exists on the subgraph, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise. The invention can convert the noise detection problem of the partial order data into the annular structure detection problem of the partial order relational graph so as to avoid a plurality of invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirement of mass data.
In addition, the implementation can also determine whether a ring structure exists on the subgraph by traversing the nodes on the subgraph; or acquiring an adjacent table of the subgraph, and determining whether a ring structure exists on the subgraph according to the connection relation of nodes in the adjacent table. Therefore, a plurality of invalid traversal operations can be avoided, the data processing efficiency is effectively improved, and the processing requirement of mass data is met.
Fig. 6 is a schematic structural diagram of a partial order data processing apparatus according to a third embodiment of the present invention, and as shown in fig. 6, the partial order data processing apparatus according to the present embodiment may include:
an obtaining module 31, configured to obtain a partial order relationship directed graph corresponding to a data set;
a segmenting module 32, configured to segment the partial order relationship directed graph into mutually independent sub-graphs;
and the processing module 33 is configured to, when an annular structure corresponding to the noise data exists on the sub-graph, perform subtraction processing on the edge of the annular structure according to a preset strategy to obtain partial order data for eliminating the noise.
In one possible design, the obtaining module 31 is specifically configured to:
constructing a partial order relation directed graph according to the partial order relation among the data in the data set; wherein, the nodes in the partial order relationship directed graph correspond to the data identifiers in the data set; the edges connecting the nodes in the partial order relationship directed graph are used for representing the partial order relationship between the nodes.
In one possible design, the slitting module 32 is specifically configured to:
according to the query, cutting the partial order relationship directed graph into mutually independent sub-graphs; wherein each query corresponds to a subgraph.
In one possible design, the processing module 33 is specifically configured to:
traversing the ring structure in the subgraph in a distributed processing mode; the traversal processing means that any node in the annular structure is taken as a starting point, other nodes of the annular structure are sequentially traversed, and all edges of the annular structure and the weight value of each edge are obtained;
and cutting at least one edge in the ring structures of all subgraphs in the order from small to large according to the weight until no ring structures exist in the subgraphs.
In one possible design, the processing module 33 is further configured to: if there are sides with the same weight in the ring structure, the sides are pruned in accordance with the principle of the minimum pruning times.
The device for processing partial order data of this embodiment may execute the technical solution in the method shown in fig. 2, and for specific implementation processes and technical principles, reference is made to the relevant description in the method shown in fig. 2, and details are not repeated here.
In this embodiment, a directed graph of partial order relationship corresponding to a data set is obtained; cutting the partial order relation directed graph into mutually independent sub-graphs; and if the annular structure corresponding to the noise data exists on the subgraph, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise. The invention can convert the noise detection problem of the partial order data into the annular structure detection problem of the partial order relational graph so as to avoid a plurality of invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirement of mass data.
Fig. 7 is a schematic structural diagram of a partial order data processing apparatus according to a fourth embodiment of the present invention, and as shown in fig. 7, the partial order data processing apparatus according to the present embodiment may further include, on the basis of the apparatus shown in fig. 6:
and the judging module 34 is configured to judge whether a ring structure corresponding to the noise data exists on the sub-graph.
In one possible design, the determining module 34 is specifically configured to:
determining whether a ring structure exists on the subgraph by traversing nodes on the subgraph; or acquiring an adjacent table of the subgraph, and determining whether a ring structure exists on the subgraph according to the connection relation of nodes in the adjacent table.
The device for processing partial order data of this embodiment may execute the technical solutions in the methods shown in fig. 2 and fig. 5, and the specific implementation process and technical principle of the device refer to the relevant descriptions in the methods shown in fig. 2 and fig. 5, which are not described herein again.
In this embodiment, a directed graph of partial order relationship corresponding to a data set is obtained; cutting the partial order relation directed graph into mutually independent sub-graphs; and if the annular structure corresponding to the noise data exists on the subgraph, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise. The invention can convert the noise detection problem of the partial order data into the annular structure detection problem of the partial order relational graph so as to avoid a plurality of invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirement of mass data.
In addition, the implementation can also determine whether a ring structure exists on the subgraph by traversing the nodes on the subgraph; or acquiring an adjacent table of the subgraph, and determining whether a ring structure exists on the subgraph according to the connection relation of nodes in the adjacent table. Therefore, a plurality of invalid traversal operations can be avoided, the data processing efficiency is effectively improved, and the processing requirement of mass data is met.
Fig. 8 is a schematic structural diagram of a partial order data processing system according to a fifth embodiment of the present invention, and as shown in fig. 8, the partial order data processing system 40 of this embodiment may include: a processor 41 and a memory 42.
A memory 42 for storing programs; the Memory 42 may include a volatile Memory (RAM), such as a Static Random Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the memory may also comprise a non-volatile memory, such as a flash memory. The memory 42 is used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in one or more of the memories 42 in a partitioned manner. And the above-mentioned computer program, computer instructions, data, etc. can be called by the processor 41.
The computer programs, computer instructions, etc. described above may be stored in one or more memories 42 in partitions. And the above-mentioned computer program, computer instructions, data, etc. can be called by the processor 41.
A processor 41 for executing the computer program stored in the memory 42 to implement the steps of the method according to the above embodiments.
Reference may be made in particular to the description relating to the preceding method embodiment.
The processor 41 and the memory 42 may be separate structures or may be integrated structures integrated together. When the processor 41 and the memory 42 are separate structures, the memory 42 and the processor 41 may be coupled by a bus 43.
In this embodiment, a directed graph of partial order relationship corresponding to a data set is obtained; cutting the partial order relation directed graph into mutually independent sub-graphs; and if the annular structure corresponding to the noise data exists on the subgraph, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise. The invention can convert the noise detection problem of the partial order data into the annular structure detection problem of the partial order relational graph so as to avoid a plurality of invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirement of mass data.
The processing system for partial order data of this embodiment may execute the technical solutions in the methods shown in fig. 2 and fig. 5, and the specific implementation process and technical principle of the processing system refer to the relevant descriptions in the methods shown in fig. 2 and fig. 5, which are not described herein again.
In addition, embodiments of the present application further provide a computer-readable storage medium, in which computer-executable instructions are stored, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment performs the above-mentioned various possible methods.
In this embodiment, a directed graph of partial order relationship corresponding to a data set is obtained; cutting the partial order relation directed graph into mutually independent sub-graphs; and if the annular structure corresponding to the noise data exists on the subgraph, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise. The invention can convert the noise detection problem of the partial order data into the annular structure detection problem of the partial order relational graph so as to avoid a plurality of invalid traversal operations, thereby effectively improving the data processing efficiency and meeting the processing requirement of mass data.
Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.
The present application further provides a program product comprising a computer program stored in a readable storage medium, from which the computer program can be read by at least one processor of a server, the execution of the computer program by the at least one processor causing the server to carry out the method of any of the embodiments of the invention described above.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for processing partial order data is characterized by comprising the following steps:
acquiring a partial order relation directed graph corresponding to a data set;
cutting the partial order relationship directed graph into mutually independent sub-graphs;
and if the sub-graph has the annular structure corresponding to the noise data, performing subtraction processing on the side of the annular structure according to a preset strategy to obtain the partial order data for eliminating the noise.
2. The method of claim 1, wherein obtaining a partial order relationship directed graph corresponding to a data set comprises:
constructing a partial order relation directed graph according to the partial order relation among the data in the data set; wherein, the nodes in the partial ordering relationship directed graph correspond to the data identifiers in the data set; and the edges connecting the nodes in the partial order relationship directed graph are used for representing the partial order relationship among the nodes.
3. The method of claim 1, wherein segmenting the partially ordered relational directed graph into mutually independent subgraphs comprises:
according to the query, cutting the partial order relationship directed graph into mutually independent sub-graphs; wherein each query corresponds to a subgraph.
4. The method according to any one of claims 1-3, further comprising:
and judging whether a ring structure corresponding to the noise data exists on the subgraph.
5. The method of claim 4, wherein the determining whether a ring structure corresponding to noise data exists on the subgraph comprises:
determining whether a ring structure exists on the sub-graph by traversing nodes on the sub-graph; or acquiring an adjacent table of the subgraph, and determining whether a ring structure exists on the subgraph according to the connection relation of nodes in the adjacent table.
6. The method of claim 1, wherein the subtracting the edges of the ring structure according to a predetermined strategy to obtain the noise-removed partial order data comprises:
traversing the annular structure in the subgraph in a distributed processing mode; the traversal processing means that other nodes of the annular structure are sequentially traversed by taking any node in the annular structure as a starting point to obtain all edges of the annular structure and a weight value of each edge;
and cutting at least one edge in the ring structures of all subgraphs in the order from small to large according to the weight until no ring structure exists in the subgraphs.
7. The method of claim 6, further comprising: and if the sides with the same weight exist in the annular structure, performing side pruning according to the principle of the minimum pruning times.
8. An apparatus for processing partial order data, comprising:
the acquisition module is used for acquiring a partial order relation directed graph corresponding to the data set;
the segmentation module is used for segmenting the partial order relation directed graph into mutually independent sub-graphs;
and the processing module is used for performing subtraction processing on the side of the annular structure according to a preset strategy when the annular structure corresponding to the noise data exists on the subgraph to obtain the partial order data for eliminating the noise.
9. A system for processing partially ordered data, comprising: the device comprises a memory and a processor, wherein the memory stores executable instructions of the processor; wherein the processor is configured to perform the method of processing partially ordered data of claims 1-7 via execution of the executable instructions.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of processing partial order data according to any one of claims 1 to 7.
CN201910538902.3A 2019-06-20 2019-06-20 Partial order data processing method, device and system and storage medium Pending CN112115304A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910538902.3A CN112115304A (en) 2019-06-20 2019-06-20 Partial order data processing method, device and system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910538902.3A CN112115304A (en) 2019-06-20 2019-06-20 Partial order data processing method, device and system and storage medium

Publications (1)

Publication Number Publication Date
CN112115304A true CN112115304A (en) 2020-12-22

Family

ID=73796566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910538902.3A Pending CN112115304A (en) 2019-06-20 2019-06-20 Partial order data processing method, device and system and storage medium

Country Status (1)

Country Link
CN (1) CN112115304A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529323A (en) * 2016-01-21 2017-03-22 华南师范大学 Multilevel security model access control data fusion method
CN106802939A (en) * 2016-12-30 2017-06-06 华为技术有限公司 A kind of method and system of resolving data conflicts
CN108540427A (en) * 2017-03-02 2018-09-14 株式会社理光 Collision detection method and detection device, access control method and access control apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529323A (en) * 2016-01-21 2017-03-22 华南师范大学 Multilevel security model access control data fusion method
CN106802939A (en) * 2016-12-30 2017-06-06 华为技术有限公司 A kind of method and system of resolving data conflicts
CN108540427A (en) * 2017-03-02 2018-09-14 株式会社理光 Collision detection method and detection device, access control method and access control apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张红斌等: "访问控制策略一致性和完备性检测方法研究", 河北工业科技, vol. 35, no. 5, pages 305 - 310 *

Similar Documents

Publication Publication Date Title
Juneja et al. Big data quality framework: Pre-processing data in weather monitoring application
Moon et al. Parallel community detection on large graphs with MapReduce and GraphChi
US20110307507A1 (en) Identifying entries and exits of strongly connected components
Wang et al. Parallelizing maximal clique and k-plex enumeration over graph data
Xu et al. Distributed maximal clique computation and management
Zhang et al. Exact algorithms for MAX-SAT
Zhang et al. Recognizing patterns in streams with imprecise timestamps
CN109389518A (en) Association analysis method and device
CN117273954B (en) Stock right relation penetration method, device and equipment based on large-scale relation map
CN116911671A (en) Data asset operation efficiency evaluation method and system
Vig et al. Test effort estimation and prediction of traditional and rapid release models using machine learning algorithms
Sepehr et al. Inferring the structure of polytree networks of dynamic systems with hidden nodes
CN113505278A (en) Graph matching method and device, electronic equipment and storage medium
CN116737511A (en) Graph-based scheduling job monitoring method and device
CN112115304A (en) Partial order data processing method, device and system and storage medium
CN112434831A (en) Troubleshooting method and device, storage medium and computer equipment
Chen et al. Dependency provenance in agent based modeling
US11386155B2 (en) Filter evaluation in a database system
CN115983377A (en) Automatic learning method, device, computing equipment and medium based on graph neural network
Chakroborti et al. Optimized Storing of Workflow Outputs through Mining Association Rules
CN112131569A (en) Risk user prediction method based on graph network random walk
Yu et al. Network inference and change point detection for piecewise-stationary time series
Xiao et al. Nested pattern queries processing optimization over multi-dimensional event streams
CN112685405A (en) Data management method, system, equipment and medium based on knowledge graph
Barkowsky et al. Host-Graph-Sensitive RETE Nets for Incremental Graph Pattern Matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination