CN111046242B - Data processing method, device, equipment and medium - Google Patents

Data processing method, device, equipment and medium Download PDF

Info

Publication number
CN111046242B
CN111046242B CN201911183944.6A CN201911183944A CN111046242B CN 111046242 B CN111046242 B CN 111046242B CN 201911183944 A CN201911183944 A CN 201911183944A CN 111046242 B CN111046242 B CN 111046242B
Authority
CN
China
Prior art keywords
node
object node
downstream
carried
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911183944.6A
Other languages
Chinese (zh)
Other versions
CN111046242A (en
Inventor
竟银行
吴云广
周刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201911183944.6A priority Critical patent/CN111046242B/en
Publication of CN111046242A publication Critical patent/CN111046242A/en
Application granted granted Critical
Publication of CN111046242B publication Critical patent/CN111046242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One or more embodiments of the present specification provide a data processing method, apparatus, device, and medium. In one embodiment, a data processing method includes: acquiring blood relationship between a plurality of object nodes and a plurality of object nodes; the blood relationship is used for representing the upstream and downstream relationship among a plurality of objects, and each object node carries a mark set; determining a target object node in a plurality of object nodes, wherein a mark set carried by the target object node is a non-empty set; inquiring a downstream object node with a blood relationship with a target object node in a plurality of object nodes; adding the marks in the mark set carried by the target object node to the mark set carried by the downstream object node; updating the downstream object node into a target object node, and continuously executing query on the downstream object node with the blood relationship with the target object node in the plurality of object nodes until a preset condition is met.

Description

Data processing method, device, equipment and medium
Technical Field
One or more embodiments of the present disclosure relate to the field of data processing technologies, and in particular, to a data processing method, apparatus, device, and medium.
Background
With the development of internet technology, the types of business products are rapidly expanding, and a large amount of data can be generated in the use process of the business products.
In order to prevent leakage of the target data field, such as sensitive data, the gateway interface, the service system, the data field, etc. related to the target data field are generally marked manually. However, by adopting a manual marking mode, the error mark or the missing mark is easy to be caused, so that a plurality of marking errors are generated, the reliability of monitoring the target data field is reduced, and the data security is further reduced.
Disclosure of Invention
One or more embodiments of the present disclosure provide a data processing method, apparatus, device, and medium, which can add a label to an object node by using a blood-edge relationship between a plurality of object nodes, so as to improve reliability of adding labels to the object node.
One or more embodiments of the present disclosure provide the following technical solutions:
in a first aspect, a data processing method is provided, including:
acquiring blood relationship between a plurality of object nodes and a plurality of object nodes; the blood relationship is used for representing the upstream and downstream relationship among a plurality of objects, and each object node carries a mark set;
Determining a target object node in a plurality of object nodes, wherein a mark set carried by the target object node is a non-empty set;
inquiring a downstream object node with a blood relationship with a target object node in a plurality of object nodes;
adding the marks in the mark set carried by the target object node to the mark set carried by the downstream object node;
updating the downstream object node into a target object node, and continuously executing query on the downstream object node with the blood relationship with the target object node in the plurality of object nodes until a preset condition is met.
In a second aspect, there is provided a data processing apparatus, the apparatus comprising:
the object acquisition module is used for acquiring blood edge relations between a plurality of object nodes and a plurality of object nodes; the blood relationship is used for representing the upstream and downstream relationship among a plurality of objects, and each object node carries a mark set;
the target determining module is used for determining a target object node in a plurality of object nodes, and a mark set carried by the target object node is a non-empty set;
the object query module is used for querying downstream object nodes with blood relationship with the target object node in the plurality of object nodes;
The mark adding module is used for adding the marks in the mark set carried by the target object node to the mark set carried by the downstream object node;
and the iteration determining module is used for updating the downstream object node into the target object node, and enabling the object inquiring module to continuously inquire the downstream object node with the blood relationship with the target object node in the plurality of object nodes until the preset condition is met.
In a third aspect, there is provided a data processing apparatus, the apparatus comprising: a processor and a memory storing computer program instructions;
the processor when executing the computer program instructions implements the data processing method as described in the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, wherein computer program instructions are stored on the computer-readable storage medium, which when executed by a processor implement the data processing method according to the first aspect.
According to one or more embodiments of the present disclosure, a plurality of object nodes and a blood-edge relationship for representing an upstream-downstream relationship between the plurality of object nodes can be obtained, a target object node in which a carried label set is a non-empty set is determined in the plurality of object nodes, and then, based on the blood-edge relationship between the plurality of object nodes, iterative query is performed, and a label is added to a downstream object node having the blood-edge relationship with the target object node by using the target object node, so that the label adding to the plurality of object nodes is completed when a preset condition is satisfied, and because the blood-edge relationship between the plurality of object nodes is used, reliability of label adding to the object nodes can be improved, error labels or missing labels are avoided, and efficiency of label adding to the object nodes is improved.
Drawings
In order to more clearly illustrate the technical solutions of one or more embodiments of the present disclosure, the following description will briefly explain the drawings required to be used in one or more embodiments of the present disclosure, and it will be apparent to those skilled in the art that other drawings may be obtained from these drawings without inventive effort.
FIG. 1 is a diagram of an example Internet access system architecture of the present specification;
FIG. 2 is a flow chart of a data processing method according to one embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a blood relationship network diagram according to one embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a blood relationship network diagram according to another embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a data processing apparatus according to one embodiment of the present disclosure;
fig. 6 is a schematic diagram of a hardware structure of a data processing apparatus according to an embodiment of the present disclosure.
Detailed Description
Features and exemplary embodiments of various aspects of the present description are described in detail below, and in order to make the objects, technical solutions and advantages of the present description more apparent, the present description is described in further detail below with reference to the accompanying drawings and the specific embodiments. It should be understood that the embodiments described herein are only some, but not all, of the embodiments of the present description. It will be apparent to one skilled in the art that the present description may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present description by showing examples of the present description.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
Fig. 1 shows an internet access system architecture diagram of one example of the present specification. As shown in fig. 1, the internet access system includes a gateway interface 120, an accessed service system server 130, and a database server 140 for the service system server 130 to retrieve data fields. The user terminal 110 may access the service system server 130 and the database server 140 through the gateway interface 120.
In this specification, the user terminal may specifically be a mobile phone, a tablet computer, a personal computer, or the like. The business system server and database server may be a high-performance electronic calculator for storing and processing data.
Taking the example that the user terminal 110 completes one query service by using the internet access system shown in fig. 1, the user terminal 110 accesses the service system server 130 providing the query service through the gateway interface 120, and the service system server 130 invokes different database servers 140 to perform data query and data processing based on the received query statement sent by the user terminal 110, so as to generate a new data table based on the queried data and the processed data.
In order to prevent leakage of the target data field, such as the sensitive data, in the database server 140, the gateway interface, the business system, the data field, the generated new data table, etc. related to the target data field related to the query service are generally marked manually. However, by adopting a manual marking mode, the error mark or the missing mark is easy to be caused, so that a plurality of marking errors are generated, the reliability of monitoring the target data field is reduced, and the data security is further reduced.
In the use process of the Internet access system, the blood-edge relation of the full-link data can be constructed by opening the data such as the online data, the data of the data bins, the report application data, the external service gateway interface and the like, so that the interdependence relation among different data can be indicated reliably. In order to solve the problems in the prior art, one or more embodiments of the present disclosure provide a data processing method, apparatus, device, and medium, which perform marking on a gateway interface, a service system, a data field, and the like related to a target data field based on a blood-edge relationship of full-link data. The data processing method provided in the present specification will be first described below.
Fig. 2 is a flow chart of a data processing method according to an embodiment of the present disclosure. The data processing method shown in fig. 2 may be performed by the service system server 130 or the database server 140 in the internet access system shown in fig. 1, or may be performed by an offline server not shown in the internet access system shown in fig. 1.
As shown in fig. 2, the data processing method may include:
s210, acquiring blood edge relations among a plurality of object nodes and a plurality of object nodes; the blood relationship is used for representing the upstream and downstream relationship among a plurality of objects, and each object node carries a mark set;
S220, determining a target object node in a plurality of object nodes, wherein a mark set carried by the target object node is a non-empty set;
s230, inquiring a downstream object node with a blood relationship with a target object node in a plurality of object nodes;
s240, adding the marks in the mark set carried by the target object node to the mark set carried by the downstream object node;
s250, updating the downstream object node to be the target object node, and continuously executing query on the downstream object node with the blood relationship with the target object node in the plurality of object nodes until a preset condition is met.
According to one or more embodiments of the present disclosure, a plurality of object nodes and a blood-edge relationship for representing an upstream-downstream relationship between the plurality of object nodes can be obtained, a target object node in which a carried label set is a non-empty set is determined in the plurality of object nodes, and then, based on the blood-edge relationship between the plurality of object nodes, iterative query is performed, and a label is added to a downstream object node having the blood-edge relationship with the target object node by using the target object node, so that the label adding to the plurality of object nodes is completed when a preset condition is satisfied, and because the blood-edge relationship between the plurality of object nodes is used, reliability of label adding to the object nodes can be improved, error labels or missing labels are avoided, and efficiency of label adding to the object nodes is improved.
In an embodiment of the present specification, optionally, the object node includes at least one of a gateway node, a service node, and a field node.
For example, if the blood-edge relationship represents an online data blood-edge relationship, the object node may include a service node corresponding to the service system and a field node corresponding to a data field in a database accessed by the service system, so that the online data blood-edge relationship may represent an upstream-downstream relationship between different service nodes and field nodes.
For another example, if the blood-edge relationship represents the blood-edge relationship of the data reporting system, the object node may include a field node corresponding to a data field in a database of the data warehouse system and a field node corresponding to a data field in a data report of the data reporting system, so that the blood-edge relationship of the data reporting system may represent an upstream-downstream relationship between different field nodes.
For another example, if the blood-edge relationship represents a data blood-edge relationship, the object node may include field nodes corresponding to data fields in different databases, so that the data reporting system blood-edge relationship may represent an upstream-downstream relationship between the different field nodes.
For example, if the relationship represents an out-service relationship, an upstream-downstream relationship between a gateway node for accessing a gateway interface of the business system and business nodes and field nodes in the online data relationship may be increased based on the online data relationship.
In some embodiments of the present disclosure, the specific method of S210 may include:
acquiring a blood edge relation data table;
analyzing node information and side information in the blood edge relation data table to obtain blood edge relations among the plurality of object nodes and the plurality of object nodes.
In these embodiments, a table of blood-relationship data corresponding to the internet access system may be first obtained. The blood relationship data table is composed of a plurality of node information and a plurality of side information. Each node information at least comprises a node identifier (Identity, ID) of the object node corresponding to the node information, a mark set, an iteration state and an outgoing side set; each side information includes at least a weight of a node identification side structure of the object node to which the side structure points. Specifically, the label set of the object node is used for representing a target data field associated with the object node, the iteration state of the object node is used for representing whether the object node is about to stop iteration, and the outgoing edge set of the object node comprises all edge structures taking the object node as a starting point.
In some embodiments of the present description, the target data field may include a sensitive data field.
In this embodiment of the present disclosure, a two-dimensional table format blood-edge relationship data table may be obtained, node information and side information in the two-dimensional table format blood-edge relationship data table may be analyzed, so that each node information corresponds to one object node, a set of labels carried by the object node is determined according to the node information, and an outgoing side set in each node information corresponds to all side structures of the object node as a starting point, and according to the side information corresponding to each side structure, a downstream object node of the object node is found, thereby creating a blood-edge relationship network graph, and based on the blood-edge relationship network graph, a blood-edge relationship between a plurality of object nodes and a plurality of object nodes may be obtained.
Fig. 3 is a schematic structural diagram of a blood relationship network diagram according to an embodiment of the present disclosure. As shown in fig. 3, the method includes five object nodes with node IDs 0,1,2,3,5, taking an object node with node ID 0 as an example, the object node carries a tag with a tag set of 0, and the iteration state of the object node may include a non-end state and an end state, where the weights of the edge structures in the outgoing edge set of the object node are 5 and 10 respectively.
In some embodiments of the present description, the blood relationship data table may be determined based on user access received structured query language (Structured Query Language, SQL) code information and log information generated during data manipulation corresponding to the internet access system.
Firstly, determining related gateway interfaces, data fields and service systems based on SQL code information and log information, taking the gateway interfaces, the data fields and the service systems as one object node respectively, then determining blood edge relations among a plurality of object nodes based on the SQL code information and the log information, finally converting the blood edge relations among the plurality of object nodes and the plurality of object nodes into node information and side information in a two-dimensional table format, and storing the converted data in a blood edge relation data table.
Therefore, the blood-edge relation among the object nodes can be obtained from the SQL code information and the log information, and the data table of the blood-edge relation is in a two-dimensional table format, so that the resource occupancy rate and the data processing amount when the object nodes are marked can be reduced, and the data processing efficiency is improved.
In other embodiments of the present disclosure, the specific method of S210 may further include:
the related gateway interfaces, data fields and service systems are determined based on the SQL code information and the log information and respectively used as an object node, and the blood-edge relationship among a plurality of object nodes is determined based on the SQL code information and the log information.
In the embodiment of the present disclosure, the method for determining the blood-edge relationships between the plurality of object nodes and the plurality of object nodes based on the SQL code information and the log information may use an existing method for determining the blood-edge relationships, which is not described herein.
Therefore, the blood relationship among the object nodes can be obtained from the SQL code information and the log information, and the gateway interface, the data field and the service system are replaced by the node IDs of the corresponding object nodes, so that the resource occupancy rate and the data processing capacity when the object nodes are marked can be reduced, and the data processing efficiency is improved.
Before S220 of some embodiments of the present disclosure, an initialization process may also be performed on a set of labels carried by a plurality of object nodes. Taking the example of marking the sensitive data fields of a plurality of object nodes, the mark corresponding to the sensitive data field can be added for the mark set carried by the object node associated with the sensitive data field.
Specifically, before S220, it may further include:
among the plurality of object nodes, determining the object node which is the same as the node identifier of the object node to which the target data field belongs;
and adding the node identifier of the determined object node to the mark set carried by the object node.
The target data field may be a data field manually marked as a sensitive data field, or may be a data field automatically identified as a sensitive data field in the base database by the marking system.
Since each data field in each database of the internet access system has an object node corresponding one to one, it is possible to determine the node ID of the object node to which the target data field belongs, and determine whether there is an object node identical to the node ID among a plurality of object nodes to be marked using the node ID, and if it is determined that there is an object node identical to the node ID, add the node ID of the determined object node to the marked set carried by the object node. After all the target data fields complete the inquiry in the object nodes to be marked, the initialization processing of the marked set carried by the object nodes is completed.
With continued reference to the blood relationship network diagram shown in fig. 3. Taking the object node with the node ID of 0 as an example, since the node ID "0" is the same as the node ID of the object node to which the target data field belongs, the node ID "0" of the object node is added to the tag set carried by the object node.
In S220 of some embodiments of the present disclosure, after initializing a set of labels carried by a plurality of object nodes, only the set of labels carried by the object nodes to which the target data field belongs is a non-empty set, and the label sets carried by other object nodes are still empty sets, the carried label sets may be determined as object nodes of the non-empty set, and the object nodes having a direct or indirect blood-edge relationship with the label sets may be added with labels by using the object nodes.
S230-S240 in some embodiments of the present description form a complete query iteration, each query iteration being replaced by a "SuperStep", each query iteration requiring traversal of all target object nodes. In the first query iteration, the target object node may be the target object node determined by S220, and in each subsequent query iteration, the target object node is the downstream object node searched for in the last query iteration, and since the downstream object node searched for in the last query iteration receives the message sent by the upstream target object node and including the tag set carried by the upstream target object node, the iteration state of the downstream object node searched for in the last query iteration is set to be a non-end state, and the iteration state of the target object node in the last query iteration is set to be an end state, so that the target object node targeted by each query iteration may also be regarded as the object node with all the iteration states being non-end states.
In some embodiments of the present disclosure, the specific method of S230 is: and searching the object node pointed by each edge structure by using the target object node as a starting point and utilizing the edge structure corresponding to the target object node as a downstream object node of the target object node.
In some embodiments of the present disclosure, the specific method of S240 may include:
judging whether a mark set carried by a downstream object node is an empty set or not;
if the mark set carried by the downstream object node is an empty set, adding the node identifier corresponding to the downstream object node and the mark in the mark set carried by the target object node to the mark set carried by the downstream object node;
if the label set carried by the downstream object node is a non-empty set, adding the label in the label set carried by the target object node to the label set carried by the downstream object node.
Since the downstream object node has no marker before being found if it is found for the first time, the set of markers carried by the downstream object node is an empty set. After the downstream object node is searched for the first time, in order to find other object nodes needing to be marked by using the node ID corresponding to the downstream object node, the data dependency relationship between other object nodes and the downstream object node is found by using the node ID corresponding to the downstream object node, so that when the mark set carried by the downstream object node is an empty set, the mark in the mark set carried by the target object node is added to the mark set carried by the downstream object node, and the node ID corresponding to the downstream object node is also required to be added to the mark set carried by the downstream object node.
If the label set carried by the downstream object node is a non-empty set, it is indicated that the downstream object node has been labeled, and the node ID corresponding to the downstream object node is not required to be repeatedly added to the label set carried by the downstream object node, but only the label in the label set carried by the target object node is required to be added to the label set carried by the downstream object node.
In other embodiments of the present disclosure, the specific method of S240 may further include:
and adding the marks which are not in the mark set carried by the downstream object node in the mark set carried by the target object node to the mark set carried by the downstream object node.
If the label set carried by the downstream object node is a non-empty set, it indicates that the downstream object node has been labeled, because one label in the label set corresponds to one node ID, one node ID corresponds to one target data field, when the label set carried by the downstream object node is labeled, the label which does not exist in the label set carried by the downstream object node in the label set carried by the target object node is added to the label set carried by the downstream object node, and the label in the label set carried by the downstream object node can be de-duplicated, so that one target data field associated with the label set is labeled only once, thereby reducing the data processing amount in the labeling process and the data processing amount when the data processing is performed by using the label result.
In some embodiments of the present disclosure, the preset conditions described in S250 may include: the query times reach the preset times. Therefore, when the query times reach the preset times, namely the query iteration is ended, the query iteration times can be controlled, and the time consumption for marking the object node is avoided being too long.
In other embodiments of the present disclosure, the preset conditions described in S250 may include: no new labels are added to the label set carried by all downstream object nodes.
Under the condition that the marks in the mark set carried by the target object node are not in the mark set carried by the downstream object node, if no new marks are added in the mark set carried by all the downstream object nodes after the query iteration, the query iteration can be ended, and therefore the reliability of the marks for the object node can be improved.
In some embodiments of the present disclosure, if the blood-edge relationship represents an out-service blood-edge relationship, after the preset condition is met, the data processing method may further include:
Determining a gateway interface corresponding to the gateway node under the condition that the carried object node with the marked set being a non-empty set is the gateway node;
and carrying out content security detection on the data transmitted by the gateway interface.
Specifically, if the object node with the carried tag set being the non-empty set is a gateway node, it can be determined that the gateway interface corresponding to the gateway node has the target data field revealed, content security detection needs to be performed on data transmitted by the gateway interface, and data with the security detection result not meeting the security detection requirement is intercepted.
In other embodiments of the present disclosure, after the preset condition is met, the data processing method may further include:
under the condition that the carried object node with the marked set being a non-empty set is a field node, determining a data field corresponding to the field node;
if the data field is the data field in the data report, setting user use permission for the data report;
if the data field is the data field in the database, desensitizing the data field.
Specifically, if the blood-edge relationship represents the blood-edge relationship of the data report system, the object node with the carried marked set being the non-empty set is a field node corresponding to the data report table, it can be determined that the data report system has the target data fields revealed, and the user use permission needs to be set for the data report, so that the user with the permission can view the target data fields. If the blood relationship represents the data blood relationship, the object node with the marked set being the non-empty set is the field node corresponding to the database, it can be determined that the target data field in the database is revealed, and the data field associated with the target data field in the database needs to be desensitized.
In the present specification, the above-described data processing method may be implemented using an ODPS Graph data processing platform. The ODPS Graph data processing platform may include a plurality of processing units, each processing unit being a workgroup (Worker), and perform distributed processing on a plurality of object nodes using the plurality of processing units.
In some embodiments of the present disclosure, where the target object node is located in a first processing unit of the plurality of processing units and the downstream object node is located in a second processing unit of the plurality of processing units, S240 may include:
enabling the first processing unit to asynchronously transmit the mark set carried by the target object node to the second processing unit;
the second processing node is caused to add the received tag of the tag set to the tag set carried by the downstream object node.
Specifically, the ODPS Graph data processing platform may implement asynchronous data transmission between multiple processing units through a frame server. Each processing unit only carries out query iteration on the object nodes positioned in the processing unit, so that a large number of object nodes can be processed in a scattered manner, the resource occupancy rate during data processing is reduced, and the data processing efficiency is improved.
In these embodiments, after S210, the data processing method may further include:
Determining a hash value of a node identifier corresponding to each object node;
performing modulo calculation on the hash value corresponding to each object node and the number of the processing units to obtain a remainder corresponding to the object node;
and distributing the object node to a processing unit corresponding to the remainder according to the remainder corresponding to each object node.
Fig. 4 is a schematic structural diagram of a blood relationship network diagram according to another embodiment of the present disclosure. As shown in fig. 4, the ODPS Graph data processing platform may include a first processing unit (workbench 0) and a second processing unit (workbench 1), that is, the processing unit of the first processing unit is identified as 0, the processing unit of the second processing unit is identified as 1, and the number of processing units is 2. The hash values corresponding to the node IDs of the five object nodes are 0,1,2,3,5, the hash values are utilized to modulo the number of processing units, the remainder corresponding to the object nodes with hash values of 0 and 2 is 0, the two object nodes are allocated to the workbench 0, the remainder corresponding to the object nodes with hash values of 1,3 and 5 is 1, and the two object nodes are allocated to the workbench 1.
Thus, it is possible to quickly match each object node to a corresponding processing unit using its corresponding node identifier.
Fig. 5 shows a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification. The data processing apparatus 300 shown in fig. 5 may be the service system server 130 or the database server 140 in the internet access system shown in fig. 1, or may be an offline server not shown in the internet access system shown in fig. 1.
As shown in fig. 5, the data processing apparatus 300 may include:
an object acquisition module 310, configured to acquire a blood-edge relationship between a plurality of object nodes and a plurality of object nodes; the blood relationship is used for representing the upstream and downstream relationship among a plurality of objects, and each object node carries a mark set;
a target determining module 320, configured to determine a target object node from a plurality of object nodes, where a set of markers carried by the target object node is a non-empty set;
an object querying module 330, configured to query a downstream object node having a blood relationship with a target object node among the plurality of object nodes;
the tag adding module 340 is configured to add a tag in the tag set carried by the target object node to the tag set carried by the downstream object node;
the iteration determining module 350 is configured to update the downstream object node to the target object node, and cause the object querying module 330 to continue to query the downstream object node having a blood relationship with the target object node in the plurality of object nodes until a preset condition is satisfied.
According to one or more embodiments of the present disclosure, a plurality of object nodes and a blood-edge relationship for representing an upstream-downstream relationship between the plurality of object nodes can be obtained, a target object node in which a carried label set is a non-empty set is determined in the plurality of object nodes, and then, based on the blood-edge relationship between the plurality of object nodes, iterative query is performed, and a label is added to a downstream object node having the blood-edge relationship with the target object node by using the target object node, so that the label adding to the plurality of object nodes is completed when a preset condition is satisfied, and because the blood-edge relationship between the plurality of object nodes is used, reliability of label adding to the object nodes can be improved, error labels or missing labels are avoided, and efficiency of label adding to the object nodes is improved.
In an embodiment of the present specification, optionally, the object node includes at least one of a gateway node, a service node, and a field node.
For example, if the blood-edge relationship represents an online data blood-edge relationship, the object node may include a service node corresponding to the service system and a field node corresponding to a data field in a database accessed by the service system, so that the online data blood-edge relationship may represent an upstream-downstream relationship between different service nodes and field nodes.
For another example, if the blood-edge relationship represents the blood-edge relationship of the data reporting system, the object node may include a field node corresponding to a data field in a database of the data warehouse system and a field node corresponding to a data field in a data report of the data reporting system, so that the blood-edge relationship of the data reporting system may represent an upstream-downstream relationship between different field nodes.
For another example, if the blood-edge relationship represents a data blood-edge relationship, the object node may include field nodes corresponding to data fields in different databases, so that the data reporting system blood-edge relationship may represent an upstream-downstream relationship between the different field nodes.
For example, if the relationship represents an out-service relationship, an upstream-downstream relationship between a gateway node for accessing a gateway interface of the business system and business nodes and field nodes in the online data relationship may be increased based on the online data relationship.
In some embodiments of the present description, the object acquisition module 310 may be specifically configured to:
acquiring a blood edge relation data table;
analyzing node information and side information in the blood edge relation data table to obtain blood edge relations among the plurality of object nodes and the plurality of object nodes.
Therefore, the blood edge relation data table corresponding to the Internet access system can be directly utilized to obtain the blood edge relations between the plurality of object nodes and the plurality of object nodes, and the blood edge relation data table is in a two-dimensional table format, so that the resource occupancy rate and the data processing capacity when the object nodes are marked can be reduced, and the data processing efficiency is improved.
In some embodiments of the present description, the data processing apparatus 300 may further include:
an object determining module, configured to determine, among a plurality of object nodes, a target object node that is the same as a node identifier of the object node to which the target data field belongs;
and the mark initialization module is used for adding the determined node identifier of the object node to the mark set carried by the object node.
Since each data field in each database of the internet access system has an object node corresponding one to one, it is possible to determine the node ID of the object node to which the target data field belongs, and determine whether there is an object node identical to the node ID among a plurality of object nodes to be marked using the node ID, and if it is determined that there is an object node identical to the node ID, add the node ID of the determined object node to the marked set carried by the object node.
In some embodiments of the present description, the tag adding module 340 may be specifically configured to:
judging whether a mark set carried by a downstream object node is an empty set or not;
if the mark set carried by the downstream object node is an empty set, adding the node identifier corresponding to the downstream object node and the mark in the mark set carried by the target object node to the mark set carried by the downstream object node;
if the label set carried by the downstream object node is a non-empty set, adding the label in the label set carried by the target object node to the label set carried by the downstream object node.
In other embodiments of the present disclosure, the tag adding module 340 may be further specifically configured to:
and adding the marks which are not in the mark set carried by the downstream object node in the mark set carried by the target object node to the mark set carried by the downstream object node.
Therefore, the labels in the label set carried by the downstream object node can be de-duplicated, so that one target data field associated with the label set is only labeled once, thereby reducing the data processing amount in the labeling process and the data processing amount when the data processing is performed by using the labeling result.
In some embodiments of the present description, the preset conditions include: the query times reach the preset times. Therefore, when the query times reach the preset times, namely the query iteration is ended, the query iteration times can be controlled, and the time consumption for marking the object node is avoided being too long.
In some embodiments of the present description, the preset conditions include: no new labels are added to the label set carried by all downstream object nodes. Under the condition that the marks in the mark set carried by the target object node are not in the mark set carried by the downstream object node, if no new marks are added in the mark set carried by all the downstream object nodes after the query iteration, the query iteration can be ended, and therefore the reliability of the marks for the object node can be improved.
In some embodiments of the present disclosure, if the blood-edge relationship represents an external service blood-edge relationship, the data processing apparatus 300 may further include:
a first node determining module, configured to determine a gateway interface corresponding to a gateway node when the carried object node whose tag set is a non-empty set is the gateway node;
The first data monitoring module is used for carrying out content security detection on the data transmitted by the gateway interface.
Specifically, if the object node with the carried tag set being the non-empty set is a gateway node, it can be determined that the gateway interface corresponding to the gateway node has the target data field revealed, content security detection needs to be performed on data transmitted by the gateway interface, and data with the security detection result not meeting the security detection requirement is intercepted.
In some embodiments of the present description, the data processing apparatus 300 may further include:
the second node determining module is used for determining a data field corresponding to the field node under the condition that the carried object node with the marked set being a non-empty set is the field node;
the second data monitoring module is used for setting user use permission for the data report table if the data field is the data field in the data report table;
and the third data monitoring module is used for performing desensitization processing on the data field if the data field is the data field in the database.
Specifically, if the blood-edge relationship represents the blood-edge relationship of the data report system, the object node with the carried marked set being the non-empty set is a field node corresponding to the data report table, it can be determined that the data report system has the target data fields revealed, and the user use permission needs to be set for the data report, so that the user with the permission can view the target data fields. If the blood relationship represents the data blood relationship, the object node with the marked set being the non-empty set is the field node corresponding to the database, it can be determined that the target data field in the database is revealed, and the data field associated with the target data field in the database needs to be desensitized.
In this specification, a plurality of processing units may be used to perform distributed processing on a plurality of object nodes.
In some embodiments of the present disclosure, in a case where the target object node is located in a first processing unit of the plurality of processing units and the downstream object node is located in a second processing unit of the plurality of processing units, the tag adding module 340 may further be specifically configured to:
enabling the first processing unit to asynchronously transmit the mark set carried by the target object node to the second processing unit;
the second processing node is caused to add the received tag of the tag set to the tag set carried by the downstream object node.
In these embodiments, optionally, the data processing apparatus 300 may further include:
the hash value determining module is used for determining the hash value of the node identifier corresponding to each object node;
the hash value processing module is used for performing modulo calculation processing on the hash value corresponding to each object node and the number of the processing units to obtain a remainder corresponding to the object node;
and the object allocation module is used for allocating the object node to the processing unit corresponding to the remainder according to the remainder corresponding to each object node.
Therefore, each processing unit only carries out query iteration on the object nodes positioned in the processing unit, a large number of object nodes can be subjected to decentralized processing, the resource occupancy rate during data processing is reduced, and the data processing efficiency is improved.
Fig. 6 is a schematic diagram showing a hardware configuration of a data processing apparatus according to an embodiment of the present specification. As shown in fig. 6, the data processing device 400 includes an input device 401, an input interface 402, a central processor 403, a memory 404, an output interface 405, and an output device 406. The input interface 402, the central processing unit 403, the memory 404, and the output interface 405 are connected to each other through the bus 410, and the input device 401 and the output device 406 are connected to the bus 410 through the input interface 402 and the output interface 405, respectively, and further connected to other components of the data processing device 400.
Specifically, the input device 401 receives input information from the outside, and transmits the input information to the central processor 403 through the input interface 402; the central processor 403 processes the input information based on computer executable instructions stored in the memory 404 to generate output information, temporarily or permanently stores the output information in the memory 404, and then transmits the output information to the output device 406 through the output interface 405; the output device 406 outputs the output information to the outside of the data processing device 400 for use by a user.
That is, the data processing apparatus shown in fig. 6 may also be implemented to include: a memory storing computer-executable instructions; and a processor that, when executing computer-executable instructions, can implement the data processing methods and apparatus described in embodiments of the present specification.
The present description also provides a computer-readable storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement the data processing methods provided by the embodiments of the present specification.
The functional blocks shown in the above block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the specification are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the foregoing describes specific embodiments of the present invention. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in the order of different embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In the foregoing, only the specific embodiments of the present disclosure are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present disclosure is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present disclosure, and these modifications or substitutions should be included in the scope of the present disclosure.

Claims (23)

1. A data processing method, applied to a server, the method comprising:
acquiring blood relationship between a plurality of object nodes and a plurality of object nodes; the blood relationship is used for representing upstream and downstream relationships among a plurality of objects, each object node carries a mark set, the object node comprises at least one of a gateway node, a service node and a field node, and each object node corresponds to one data field;
determining a target object node in a plurality of object nodes, wherein a mark set carried by the target object node is a non-empty set;
querying a downstream object node with a blood relationship with the target object node in a plurality of object nodes;
adding the marks in the mark set carried by the target object node to the mark set carried by the downstream object node;
updating the downstream object node into a target object node, and continuously executing query on the downstream object nodes with blood relationship with the target object node in a plurality of object nodes until a preset condition is met;
wherein when the target object node is located in a first processing unit of a plurality of processing units and the downstream object node is located in a second processing unit of a plurality of processing units, adding the tag in the tag set carried by the target object node to the tag set carried by the downstream object node includes:
Enabling the first processing unit to asynchronously transmit a mark set carried by the target object node to the second processing unit;
causing the second processing unit to add the received marker of the set of markers to the set of markers carried by the downstream object node.
2. The method of claim 1, wherein the preset condition comprises: the query times reach the preset times.
3. The method of claim 1, wherein the preset condition comprises: no new markers are added in the marker set carried by all the downstream object nodes.
4. The method of claim 1, wherein, before determining that the carried marker set is the target object node of the non-empty set among the plurality of object nodes, further comprises:
among the plurality of object nodes, determining the object node which is the same as the node identifier of the object node to which the target data field belongs;
and adding the node identifier of the determined object node to the mark set carried by the object node.
5. The method of claim 4, wherein the adding the tag of the set of tags carried by the target object node to the set of tags carried by the downstream object node comprises:
And if the mark set carried by the downstream object node is an empty set, adding the node identifier corresponding to the downstream object node and the mark in the mark set carried by the target object node to the mark set carried by the downstream object node.
6. The method of claim 1, wherein the adding the tag in the set of tags carried by the target object node to the set of tags carried by the downstream object node comprises:
and adding the marks which do not exist in the mark set carried by the downstream object node in the mark set carried by the target object node to the mark set carried by the downstream object node.
7. The method of claim 1, wherein after the predetermined condition is met, further comprising:
determining a gateway interface corresponding to a gateway node under the condition that the carried object node with the marked set being a non-empty set is the gateway node;
and carrying out content security detection on the data transmitted by the gateway interface.
8. The method of claim 1, wherein after the predetermined condition is met, further comprising:
determining a data field corresponding to a field node under the condition that the carried object node with the marked set being a non-empty set is the field node;
If the data field is a data field in the data report, setting user use permission for the data report;
and if the data field is a data field in the database, desensitizing the data field.
9. The method of claim 1, wherein the obtaining a blood-lineage relationship between a plurality of object nodes and a plurality of the object nodes comprises:
acquiring a blood edge relation data table;
analyzing node information and side information in the blood-edge relation data table to obtain blood-edge relations between a plurality of object nodes and a plurality of object nodes.
10. The method of claim 9, further comprising:
determining a hash value of a node identifier corresponding to each object node;
performing modulo calculation on the hash value corresponding to each object node and the number of the processing units to obtain a remainder corresponding to the object node;
and distributing the object node to a processing unit corresponding to the remainder according to the remainder corresponding to each object node.
11. A data processing apparatus for application to a server, the apparatus comprising:
the object acquisition module is used for acquiring blood edge relations between a plurality of object nodes and a plurality of object nodes; the blood relationship is used for representing upstream and downstream relationships among a plurality of objects, each object node carries a mark set, the object node comprises at least one of a gateway node, a service node and a field node, and each object node corresponds to one data field;
The target determining module is used for determining target object nodes in a plurality of object nodes, and a mark set carried by the target object nodes is a non-empty set;
the object query module is used for querying downstream object nodes with blood relationship with the target object node in the plurality of object nodes;
the mark adding module is used for adding the mark in the mark set carried by the target object node to the mark set carried by the downstream object node;
the iteration determining module is used for updating the downstream object node into a target object node, and enabling the object inquiring module to continuously execute inquiring on the downstream object node with blood relationship with the target object node in the plurality of object nodes until a preset condition is met;
wherein, in the case that the target object node is located in a first processing unit of the plurality of processing units and the downstream object node is located in a second processing unit of the plurality of processing units, the tag adding module is specifically configured to:
enabling the first processing unit to asynchronously transmit a mark set carried by the target object node to the second processing unit;
Causing the second processing unit to add the received marker of the set of markers to the set of markers carried by the downstream object node.
12. The apparatus of claim 11, wherein the preset condition comprises: the query times reach the preset times.
13. The apparatus of claim 11, wherein the preset condition comprises: no new markers are added in the marker set carried by all the downstream object nodes.
14. The apparatus of claim 11, wherein the apparatus further comprises:
an object determining module, configured to determine, among a plurality of the object nodes, a target object node that is the same as a node identifier of the object node to which the target data field belongs;
and the mark initialization module is used for adding the determined node identifier of the object node to the mark set carried by the object node.
15. The apparatus of claim 14, wherein the tag adding module is specifically configured to:
and if the mark set carried by the downstream object node is an empty set, adding the node identifier corresponding to the downstream object node and the mark in the mark set carried by the target object node to the mark set carried by the downstream object node.
16. The apparatus of claim 11, wherein the tag adding module is specifically configured to:
and adding the marks which do not exist in the mark set carried by the downstream object node in the mark set carried by the target object node to the mark set carried by the downstream object node.
17. The apparatus of claim 11, wherein the object node comprises at least one of a gateway node, a traffic node, and a field node.
18. The apparatus of claim 11, wherein the apparatus further comprises:
a first node determining module, configured to determine a gateway interface corresponding to a gateway node when a carried object node whose tag set is a non-empty set is the gateway node;
and the first data monitoring module is used for carrying out content security detection on the data transmitted by the gateway interface.
19. The apparatus of claim 11, wherein the apparatus further comprises:
the second node determining module is used for determining a data field corresponding to a field node under the condition that the carried object node with the marked set being a non-empty set is the field node;
the second data monitoring module is used for setting user use permission for the data report if the data field is a data field in the data report;
And the third data monitoring module is used for carrying out desensitization processing on the data field if the data field is the data field in the database.
20. The apparatus of claim 11, wherein the object acquisition module is specifically configured to:
acquiring a blood edge relation data table;
analyzing node information and side information in the blood-edge relation data table to obtain blood-edge relations between a plurality of object nodes and a plurality of object nodes.
21. The apparatus of claim 11, wherein the apparatus further comprises:
the hash value determining module is used for determining the hash value of the node identifier corresponding to each object node;
the hash value processing module is used for performing modulo calculation processing on the hash value corresponding to each object node and the number of the processing units to obtain a remainder corresponding to the object node;
and the object allocation module is used for allocating the object node to the processing unit corresponding to the remainder according to the remainder corresponding to each object node.
22. A data processing apparatus, the apparatus comprising: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a data processing method as claimed in any one of claims 1-10.
23. A computer-readable storage medium, on which computer program instructions are stored which, when executed by a processor, implement a data processing method according to any one of claims 1-10.
CN201911183944.6A 2019-11-27 2019-11-27 Data processing method, device, equipment and medium Active CN111046242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911183944.6A CN111046242B (en) 2019-11-27 2019-11-27 Data processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911183944.6A CN111046242B (en) 2019-11-27 2019-11-27 Data processing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN111046242A CN111046242A (en) 2020-04-21
CN111046242B true CN111046242B (en) 2023-09-26

Family

ID=70233824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911183944.6A Active CN111046242B (en) 2019-11-27 2019-11-27 Data processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN111046242B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672653A (en) * 2021-08-09 2021-11-19 支付宝(杭州)信息技术有限公司 Method and device for identifying private data in database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997369A (en) * 2016-01-26 2017-08-01 阿里巴巴集团控股有限公司 Data clearing method and device
CN109325078A (en) * 2018-09-18 2019-02-12 拉扎斯网络科技(上海)有限公司 Method and device is determined based on the data blood relationship of structured data
CN109739894A (en) * 2019-01-04 2019-05-10 深圳前海微众银行股份有限公司 Supplement method, apparatus, equipment and the storage medium of metadata description

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030154197A1 (en) * 2002-02-13 2003-08-14 Permutta Technologies Flexible relational data storage method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997369A (en) * 2016-01-26 2017-08-01 阿里巴巴集团控股有限公司 Data clearing method and device
CN109325078A (en) * 2018-09-18 2019-02-12 拉扎斯网络科技(上海)有限公司 Method and device is determined based on the data blood relationship of structured data
CN109739894A (en) * 2019-01-04 2019-05-10 深圳前海微众银行股份有限公司 Supplement method, apparatus, equipment and the storage medium of metadata description

Also Published As

Publication number Publication date
CN111046242A (en) 2020-04-21

Similar Documents

Publication Publication Date Title
CN110609844B (en) Data updating method, device and system
CN112153627A (en) Management method and system of Internet of things card and electronic equipment
CN111046242B (en) Data processing method, device, equipment and medium
CN113900907B (en) Mapping construction method and system
CN106997369B (en) Data cleaning method and device
CN107277087B (en) Data processing method and device
CN113132267A (en) Distributed system, data aggregation method and computer readable storage medium
CN110704699A (en) Data image construction method and device, computer equipment and storage medium
CN110427538B (en) Data query method, data storage method, data query device, data storage device and electronic equipment
CN108334524B (en) Storm log error analysis method and device
CN114584627B (en) Middle station dispatching system and method with network monitoring function
CN110955460A (en) Service process starting method and device, electronic equipment and storage medium
CN108197253B (en) Equipment query method, device and equipment of cloud monitoring platform
WO2022183713A1 (en) Data storage method, apparatus, and device, and storage medium
CN111177155B (en) Message filtering method, system and computer equipment
CN110941568B (en) Cache updating method, device, system, electronic equipment and medium
CN110990611B (en) Picture caching method and device, electronic equipment and storage medium
CN113296687A (en) Data processing method, device, computing equipment and medium
CN114238767A (en) Service recommendation method and device, computer equipment and storage medium
CN109522014B (en) Algorithm processing method based on new product development, electronic device and readable storage medium
CN101882354B (en) Device and method for dynamically calculating physical quantity of sensor
CN111008220A (en) Dynamic identification method and device of data source, storage medium and electronic device
CN113364775B (en) Calling method and device of microservice and server
CN112948246B (en) AB test control method, device and equipment of data platform and storage medium
CN113094588B (en) Information display method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant