CN115328993A - Data processing method and device, storage medium and electronic equipment - Google Patents
Data processing method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN115328993A CN115328993A CN202210981624.0A CN202210981624A CN115328993A CN 115328993 A CN115328993 A CN 115328993A CN 202210981624 A CN202210981624 A CN 202210981624A CN 115328993 A CN115328993 A CN 115328993A
- Authority
- CN
- China
- Prior art keywords
- data
- entity
- current
- incremental
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 52
- 238000012545 processing Methods 0.000 claims abstract description 38
- 238000003909 pattern recognition Methods 0.000 claims description 76
- 230000035515 penetration Effects 0.000 claims description 26
- 238000004891 communication Methods 0.000 claims description 16
- 238000010276 construction Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 32
- 230000008569 process Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 16
- 238000004422 calculation algorithm Methods 0.000 description 8
- 230000003203 everyday effect Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 230000003190 augmentative effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000149 penetrating effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Power Sources (AREA)
Abstract
The present disclosure provides a data processing method, apparatus, storage medium and electronic device; relates to the technical field of data processing. The method comprises the following steps: acquiring current full data of a target service, and constructing a plurality of connected subgraphs according to the current full data, wherein the current full data at least comprises current incremental data; identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data; and determining a first connection subgraph corresponding to the current incremental data, and updating the first connection subgraph according to the data mode of the current incremental data to obtain a second connection subgraph. The method and the device for calculating the connected graph based on the incremental data can reduce consumption of calculation resources and improve data processing efficiency.
Description
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method, a data processing apparatus, a computer-readable storage medium, and an electronic device.
Background
The connectivity graph is used as an important relational data representation form and widely applied to some service data analysis scenes. For example, a connected graph set corresponding to a service may be calculated, and the connected graph set is further mined to implement analysis on service data.
With the update of the service data, the connectivity graph in the connectivity graph set corresponding to the service also needs to be updated accordingly. At present, when a connectivity graph is calculated by adopting full data of related services, each time after data is changed, the changed data needs to be added into historical full data, and then overall calculation is performed based on the latest full data. The computation of the connected graph of the historical full data is executed when the system is in cold start, the historical full data can be repeatedly computed in the computation process of each day, the waste of computing resources is caused, the time consumption of the whole computation process is too long, and the data processing efficiency is low.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure aims to provide a data processing method, a data processing apparatus, a computer-readable storage medium, and an electronic device, so as to overcome the problems of large data volume of a computed connectivity graph, excessive computing resources waste, and low data processing efficiency due to too long time consumed in a computing process, at least to a certain extent, caused by related technologies.
According to a first aspect of the present disclosure, there is provided a data processing method comprising:
acquiring current full data of a target service, and constructing a plurality of connected subgraphs according to the current full data, wherein the current full data at least comprises current incremental data;
identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data;
and determining a first connection subgraph corresponding to the current incremental data, and updating the first connection subgraph according to the data mode of the current incremental data to obtain a second connection subgraph.
In an exemplary embodiment of the disclosure, the data pattern of the current incremental data includes at least any one of:
the current incremental data is first incremental data, the first incremental data does not influence the relation between the connected subgraphs, and the first incremental data comprises first incremental entity data and first incremental relation data;
the current incremental data is changed data, the changed data does not influence the relation between the connected subgraphs, and the changed data comprises changed entity data and changed relation data;
the current incremental data is second added relation data, and the second added relation data enables the entity data in the connected subgraph to generate a relation;
and the current incremental data is second newly added entity data, and the second newly added entity data and the entity data in the connected subgraph generate a relationship.
In an exemplary embodiment of the present disclosure, the current full volume data further comprises historical full volume data; after obtaining the current full data of the target service, the method further comprises:
constructing a full data recording table according to the entity data and the relation data in the historical full data, wherein the full data recording table comprises a full entity table and a full relation table;
an incremental data record table is constructed according to the entity data and the relationship data in the current incremental data, and the incremental data record table comprises an incremental entity table and an incremental relationship table;
and constructing a pattern recognition result table according to the first field in the full data record table and the second field in the incremental data record table, wherein the pattern recognition result table comprises a pattern recognition entity table and a pattern recognition relation table.
In an exemplary embodiment of the present disclosure, the target field is a connected sub-graph identification field; the identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data includes:
acquiring a unique identifier of the current incremental data, and judging whether a connected sub-graph identifier field in a mode identification result table where the unique identifier is located is empty or not;
and if the connected sub-graph identification field is empty, determining that the current incremental data is first newly-increased data.
In an exemplary embodiment of the present disclosure, the target fields are a connected sub-graph identification field, a history attribute field, and a current attribute field; the identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data includes:
acquiring a unique identifier of the current incremental data, judging whether a connected sub-graph identifier field in a mode identification result table where the unique identifier is located is empty or not, and judging whether the current attribute field in the mode identification result table is the same as the historical attribute field or not;
and if the connected sub-graph identification field is not empty and the current attribute field is different from the historical attribute field, determining that the current incremental data is changed data.
In an exemplary embodiment of the present disclosure, the identifying, based on a data record table corresponding to the current incremental data, a target field in the current incremental data to determine a data mode of the current incremental data includes:
inquiring first newly-added relation data in a pattern recognition relation table, and determining target entity data connected with the first newly-added relation data;
judging whether a connected sub-graph identification field corresponding to the target entity data in the full entity table is empty or not;
and if the connected sub-graph identification field is not empty, determining that the current incremental data is second newly-added relation data.
In an exemplary embodiment of the present disclosure, the identifying, based on a data record table corresponding to the current incremental data, a target field in the current incremental data to determine a data mode of the current incremental data includes:
the query pattern identifies first newly added entity data in the entity table;
querying target relationship data connected with the first new entity data in a pattern recognition relationship table, wherein the target relationship data is connected with the first new entity data and historical entity data;
judging whether a connected sub-graph identification field corresponding to the historical entity data in the full entity table is empty or not;
and if the connected sub-graph identification field is not empty, determining that the current incremental data is second newly added entity data.
In an exemplary embodiment of the present disclosure, the current incremental data is first incremental data; updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph, wherein the method comprises the following steps:
acquiring first newly-added entity data in a pattern recognition entity table and first newly-added relation data in a pattern recognition relation table;
and constructing the second connected graph according to the first newly-added entity data and the first newly-added relation data.
In an exemplary embodiment of the present disclosure, the current incremental data is change data; updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph, wherein the method comprises the following steps:
and replacing the historical data in the first connected subgraph with the changed data to obtain a second connected subgraph.
In an exemplary embodiment of the present disclosure, the current incremental data is second incremental relationship data; updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph, wherein the method comprises the following steps:
determining a plurality of first connection subgraphs connected with the second newly-added relationship data;
and combining a plurality of first communication subgraphs to obtain the second communication subgraph.
In an exemplary embodiment of the present disclosure, the current incremental data is second new entity data; updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph, wherein the method comprises the following steps:
constructing a target connection subgraph according to first newly-added entity data and target relation data connected with the first newly-added entity data, wherein the target relation data is connected with the first newly-added entity data and historical entity data;
determining a first connection subgraph corresponding to the historical entity data;
and merging the target connected subgraph and the first connected subgraph to obtain the second connected subgraph.
In an exemplary embodiment of the present disclosure, after obtaining the second connected subgraph, the method further includes:
and performing stock right penetration on the second connected subgraph to determine the stock right data of each target entity in the second connected subgraph.
According to a second aspect of the present disclosure, there is provided a data processing apparatus comprising:
the connection subgraph construction module is used for obtaining current full data of a target service and constructing a plurality of connection subgraphs according to the current full data, wherein the current full data comprises historical full data and current incremental data;
the data mode determining module is used for identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data so as to determine the data mode of the current incremental data;
and the connected subgraph updating module is used for determining a first connected subgraph corresponding to the current incremental data and updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph.
According to a third aspect of the present disclosure, there is provided a data processing apparatus comprising:
a connected subgraph obtaining module, configured to obtain a second connected subgraph in the data processing method;
and the stock right data determining module is used for performing stock right penetration on the second connected subgraph to determine the stock right data of each target entity in the second connected subgraph.
According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.
According to a fifth aspect of the present disclosure, there is provided an electronic apparatus comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.
Exemplary embodiments of the present disclosure may have some or all of the following advantages:
in the data processing method provided by the disclosed example embodiment, a plurality of connected subgraphs are constructed by obtaining current full-scale data of a target service and according to the current full-scale data, wherein the current full-scale data comprises historical full-scale data and current incremental data; identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data; and determining a first connected subgraph corresponding to the current incremental data, and updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph. The method and the device for calculating the connection diagram based on the incremental data can reduce consumption of calculation resources and improve data processing efficiency.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those skilled in the art without inventive effort.
Fig. 1 is a schematic diagram of a system architecture of a data processing method and apparatus to which the embodiments of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow chart of a data processing method in an embodiment of the disclosure;
FIG. 3 schematically illustrates a connectivity sub-diagram in an embodiment of the disclosure;
FIG. 4 is a schematic diagram illustrating an update of a connected subgraph in an embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating a connectivity sub-diagram in an incremental data mode according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram illustrating a connectivity sub-diagram in another incremental data mode according to the embodiment of the present disclosure;
FIG. 7 schematically shows a block diagram of a data processing apparatus in an embodiment of the present disclosure;
FIG. 8 is a block diagram that schematically illustrates an equity penetration device, in an embodiment of the disclosure;
fig. 9 schematically illustrates a structural schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the embodiments of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which a data processing method and apparatus according to an embodiment of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The terminal devices 101, 102, 103 may be various electronic devices including, but not limited to, desktop computers, portable computers, smart phones, tablet computers, and the like. The server 105 may be a server, a server cluster formed by a plurality of servers, a virtualization platform, or a cloud computing service center. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
The data processing method provided by the exemplary embodiment of the present disclosure may be performed by the server 105, and accordingly, the data processing apparatus may be provided in the server 105. For example, after the server 105 receives the current full-volume data input by the terminal device 101, by executing the data processing method, the connected subgraph of the historical full-volume data and the second connected subgraph corresponding to the incremental data can be sent to the terminal device 101 to be displayed to the relevant user. However, it is easily understood by those skilled in the art that the data processing method provided by the exemplary embodiment of the present disclosure may also be executed by one or more of the terminal devices 101, 102, 103, and accordingly, the data processing apparatus may also be disposed in the terminal devices 101, 102, 103. For example, the terminal device 101 executes the data processing method, and may directly calculate a connected subgraph of the current full-volume data, and directly display the calculated connected subgraph of the historical full-volume data and a second connected subgraph corresponding to the incremental data on the display screen of the terminal device 101 to be displayed to the relevant user, which is not particularly limited in the present disclosure.
The technical scheme of the embodiment of the disclosure is explained in detail as follows:
in the exemplary embodiment of the present disclosure, a business scenario for calculating the stock penetration in the financial field may be taken as an example for explanation. The stock penetration is to analyze the stock structure of the enterprise, penetrate the stockholder company upwards and penetrate the sub-company downwards so as to obtain the partner information of the stockholder company or the sub-company. Wherein the shareholder penetration is based on computing a connectivity graph of the traffic data. In graph theory, the connectivity graph is based on the concept of connectivity. In an undirected graph G, the undirected graph can be divided into a plurality of mutually isolated subgraphs by utilizing a connected graph algorithm,
at present, when a connection graph is calculated by adopting the total data of related services, the changed data needs to be added into the historical total data after the data is changed every time, and then the whole calculation is carried out based on the latest total data. For example, in a business scenario where a project calculates shareholder penetration, the project shares about 13 hundred million pieces of historical full data, including entity data and relationship data. After the historical full data is stored in a graph database, about 4300 ten thousand connected subgraphs can be generated through calculation of a connected graph algorithm. Then, executing a stock right penetration algorithm on 4300 ten thousand connected subgraphs respectively to generate 4300 ten thousand stock right graphs, wherein the stock right graphs are output results required in the service scene.
However, the subsequent stock right penetration calculation is based on 13 hundred million pieces of historical full data, and the communication graph algorithm and the stock right penetration algorithm are sequentially executed by adding newly increased data every day, so that the whole calculation process needs to be performed once every day, and about 13 hours of time is consumed. Particularly, the computation of the connected graph of the historical full data is already executed during the cold start of the system, and the historical full data is repeatedly computed in the computation process of each subsequent day, so that the waste of computing resources is caused, and the whole computation process is time-consuming and overlong.
Based on one or more of the above problems, the present exemplary embodiment provides a data processing method, in which a connected graph is calculated based on incremental data, so that the amount of data calculation can be reduced, the historical full-amount data that has been calculated before does not participate in subsequent calculation, and only new data needs to be calculated each time thereafter, thereby achieving the purposes of saving calculation resources and shortening calculation time. Referring to fig. 2, the data processing method may include steps S210 to S230:
s210, obtaining current full-scale data of a target service, and constructing a plurality of connected subgraphs according to the current full-scale data, wherein the current full-scale data at least comprises current incremental data;
s220, identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data;
and S230, determining a first connection subgraph corresponding to the current incremental data, and updating the first connection subgraph according to the data mode of the current incremental data to obtain a second connection subgraph.
In the data processing method provided by the disclosed example embodiment, a plurality of connected subgraphs are constructed by obtaining current full-scale data of a target service and according to the current full-scale data, wherein the current full-scale data comprises historical full-scale data and current incremental data; identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data; and determining a first connected subgraph corresponding to the current incremental data, and updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph. The method and the device for calculating the connected graph based on the incremental data can reduce consumption of calculation resources and improve data processing efficiency.
Next, the above-described steps of the present exemplary embodiment will be described in more detail.
In step S210, current full-volume data of a target service is obtained, and a plurality of connected subgraphs are constructed according to the current full-volume data, where the current full-volume data at least includes current incremental data.
In the exemplary embodiment of the present disclosure, in a business scenario of computing the penetration of the equity in the financial field, correspondingly, the target business may be an equity data analysis business of an enterprise. The current full data of the target service may be obtained according to a preset update period, where the update period may be set according to actual needs, for example, the update period may be 1 day, 3 days, or a week, which is not specifically limited by this disclosure. The current full-volume data may include historical full-volume data and current incremental data, and for example, when the update cycle is 1 day, the current full-volume data is the acquired current full-volume data, and the current full-volume data includes the historical full-volume data and current incremental data, for example, in the service scenario, the number of incremental data per day is about 100 ten thousand.
After the current full data of the target service is obtained, a full data record table can be constructed according to the entity data and the relation data in the historical full data, and the full data record table can comprise a full entity table and a full relation table. As shown in table 1, a table structure of the full amount entity table and each field in the table are shown, and data such as a unique identifier of entity data, a tag of the entity data, an attribute of the entity data, a connected subgraph identifier to which the entity data belongs, and an attribute encryption value of the entity data in the historical full amount data are recorded in the table.
TABLE 1
id | label | props | componentId | propsMd5 |
The id represents a unique identifier of the entity data, and may be an entity data primary key, or may be an encryption calculation performed on the entity data primary key, such as an Md5 value of the primary key, or an Md5 value of an entity data tag or an Md5 value of an entity data attribute, which is not limited in this disclosure. label represents a label of entity data, such as a category of entity data, e.g., company, legal, natural, etc. The tips represents attributes of the entity data, such as age, gender, company name, etc., and the attribute format of the entity data is json (java script object name, data interaction format) character string. The componentId represents the result of computation of the connected graph, namely the connected subgraph identification such as the connected subgraph ID to which the entity data belongs. The prpsMd 5 represents an attribute encryption value of the entity data, such as an Md5 value of an attribute json string, and can be used for judging whether the attribute of the entity data changes.
As shown in table 2, a table structure of the full-volume relational table and each field in the table are shown, and data such as a unique identifier of relational data, a tag of the relational data, an identifier and a tag of entity data connected to the relational data, an attribute of the relational data, a connected subgraph identifier to which the relational data belongs, and an attribute encryption value of the relational data in the historical full-volume data are recorded in the table.
TABLE 2
id | label | srcId | dstId | srcLabel | dstLabel | props | componentId | propsMd5 |
Wherein, id represents the unique identifier of the relational data, and id can be the primary key of the relational data, such as order number, transaction serial number, and the like. The encryption calculation may be performed on the relationship data primary key, such as the Md5 value of the primary key, the Md5 value of the relationship data tag, the Md5 value of the relationship data attribute, and the like, which is not limited in this disclosure. label represents a label of the relational data, such as a category of the relational data. srcId represents a source entity ID in the two entity data connected by the relational data, dstId represents a destination entity ID in the two entity data connected by the relational data, srcLabel represents a label of the source entity, and dstLabel represents a label of the destination entity. The source entity is an entity used as a source in the relational data, and the destination entity is an entity used as a target in the relational data. The tips represents attributes of the relational data, such as transfer amount, transaction time and the like, and the attribute format of the relational data is json character string. The componentId represents the result of computation of the connected graph, namely the connected subgraph identification to which the relation data belongs, such as a connected subgraph ID. The propssmd 5 represents an attribute encryption value of the relational data, such as an Md5 value of an attribute json character string, and may be used to determine whether the attribute of the relational data changes.
Similarly, the current incremental data may be stored separately, and an incremental data record table may be constructed according to the entity data and the relationship data in the current incremental data, and the incremental data record table may include an incremental entity table and an incremental relationship table. As shown in table 3, a table structure of the incremental entity table and each field in the table are shown, and data such as a unique identifier of entity data in the current incremental data, a tag of the entity data, an attribute encryption value of the entity data, and a time for recording the entity data are recorded in the table.
TABLE 3
id | label | props | propsMd5 | date |
Wherein, date represents the time for recording the entity data, and the meanings of other fields are the same as those of the corresponding fields in table 1, which is not described herein again.
As shown in table 4, a table structure of the incremental relationship table and each field in the table are shown, and data such as a unique identifier of the relationship data, a tag of the relationship data, an identifier and a tag of entity data connected to the relationship data, an attribute of the relationship data, an attribute encryption value of the relationship data, and a time for recording the entity data in the current incremental data are recorded in the table.
TABLE 4
id | label | srcId | dstId | srcLabel | dstLabel | props | propsMd5 | date |
Wherein, date represents the time for recording the entity data, and the meanings of other fields are the same as those of the corresponding fields in table 2, which is not described herein again.
After the full data record table and the incremental data record table are obtained, a pattern recognition result table can be constructed according to a first field in the full data record table and a second field in the incremental data record table, and the pattern recognition result table can comprise a pattern recognition entity table and a pattern recognition relation table.
Illustratively, a schema recognition entity table may be constructed from a first field in the full entity table and a second field in the incremental entity table. The first field in the full entity table comprises two fields of propsid 5 and componentId, and the second field in the incremental entity table comprises four fields of id, label, props and propsid 5. When the date field in the incremental entity table is the current time and the id of the full entity table is the same as the id of the incremental entity table, the pattern recognition entity table may be constructed from the fields of propmd 5, componentId in the full entity table and the id, label, props, and propmd 5 in the incremental entity table. In the pattern recognition entity table, the propsMd5 field in the full entity table is changed to the histPropsMd5 field. In addition, the pattern recognition entity table also comprises an incrPattern field which represents the data pattern of the incremental data.
As shown in table 5, a table structure of the pattern recognition entity table and each field in the table are shown, and data such as a unique identifier (id) of the entity data, a tag (label) of the entity data, an attribute (props) of the entity data, an attribute encrypted value (propsMd 5) of the entity data, a connected sub-graph identifier (componentId) to which the entity data belongs, a history attribute encrypted value (histPropsMd 5) of the entity data, and a data pattern (incrPattern) of the entity data are recorded in the table.
TABLE 5
id | label | props | propsMd5 | componentId | histPropsMd5 | incrPattern |
Similarly, the pattern recognition relationship table may be constructed from a first field in the full-scale relationship table and a second field in the incremental relationship table. The first field in the full-quantity relation table comprises two fields of propsid 5 and componentId, and the second field in the incremental relation table comprises six fields of id, label, srcId, dstId, tips and propsid 5. When the date field in the incremental relationship table is the current time and the id of the full-amount relationship table is the same as the id of the incremental relationship table, the pattern recognition relationship table can be constructed by the propMd 5 field in the full-amount relationship table, the componentId field and the id, label, srcId, dstId, props and propMd 5 fields in the incremental relationship table. In the pattern recognition relation table, the promsd 5 field in the full-size relation table is replaced with the histprpsd 5 field. In addition, the pattern recognition relation table also comprises an incrPattern field which represents the data pattern of the incremental data.
As shown in table 6, the table structure of the pattern recognition relationship table and each field in the table are shown, and data such as a unique Identifier (ID) of the relationship data, a tag (label) of the relationship data, a source entity ID (srcId) and a destination entity ID (dstId) in two entity data connected by the relationship data, an attribute (tips) of the relationship data, an attribute encrypted value (promsmd 5) of the relationship data, a connected subgraph identifier (componentId) to which the relationship data belongs, a history attribute encrypted value (histpropsmid 5) of the relationship data, and a data pattern (incrPattern) of the relationship data are recorded in the table.
TABLE 6
id | label | srcId | dstId | props | propsMd5 | componentId | histPropsMd5 | incrPattern |
After the current full data of the target service is obtained, the communication graph calculation can be carried out on the current full data to obtain a plurality of communication subgraphs. Referring to fig. 3, a schematic diagram of a plurality of connected subgraphs (subgraph 1, subgraph 2, \8230;, subgraph n) is schematically shown, each connected subgraph is composed of entity data and relationship data, and any two entity data are connected through the relationship data. Then, each entity data and relationship data can be recorded in the full data record table and the incremental data record table, and the fields such as the connected subgraph ID to which each entity data and relationship data belong can be included.
The method comprises the steps of calculating the connected graphs of the current full data, and recording the entity data and the relation data in each connected graph in a table structure mode so as to identify the data modes of the entity data and the relation data in the incremental data, thereby realizing flexible processing of the incremental data in different data modes and improving the data processing efficiency.
In step S220, a target field in the current incremental data is identified based on a data record table corresponding to the current incremental data, so as to determine a data mode of the current incremental data.
In an example embodiment of the present disclosure, the data pattern of the current incremental data may include at least any one of:
the first mode is as follows: the current incremental data is first newly-added data, the first newly-added data does not influence the relationship between the connected subgraphs, and the first newly-added data can comprise first newly-added entity data and first newly-added relationship data. For example, the first new added data is added data that does not exist in the historical full amount data, the first new added entity data may be added entity data such as a new natural person and a new company, and the first new added relationship data may be added investment relationship data.
And a second mode: the current incremental data is changed data, the changed data does not affect the relation between the connected subgraphs, and the changed data can comprise changed entity data and changed relation data. For example, the changed data is data that exists in the entire history data and is changed when the current data is updated, the changed entity data may be attribute changes of an entity such as a natural person or a company, and the changed relationship data may be share increase, decrease, or withdrawal.
Referring to fig. 4, a schematic diagram of a connected subgraph continuously changing with the addition of current incremental data is schematically shown. FIG. 4 includes connected subgraphs at 3 times cut from the time axis of current incremental data change, assuming that time t is a connected subgraph G constructed from the bottoming data of the historical full data t At the moment of t +1, because three entity data of G, h and i are newly added, five relation data of 8, 9, 10, 11 and 12 are newly added, so that the connected subgraph G t Updating to connected subgraph G t+1 . At time t +2, subgraph G is connected t+1 The property of the entity data a in (1) is changed into a 'and the property of the relationship data 1 is changed into 1', which shows that the investment share of a to b is increased or decreased when the property of the entity data a is changed.
And a third mode: the current incremental data is second newly-added relation data, and the second newly-added relation data can enable the entity data in the multiple connected subgraphs to generate relations. Referring to fig. 5, the sub-graph M and the sub-graph N are originally two connected sub-graphs isolated from each other, and when second newly-added relationship data is added, a relationship can be generated between entity data in the sub-graph M and the sub-graph N.
And a fourth mode: and the current incremental data is second newly added entity data, and the second newly added entity data generates a relationship with the entity data in the plurality of connected subgraphs. Referring to fig. 6, the sub-graph M and the sub-graph N are originally two connected sub-graphs isolated from each other, and when a second newly added entity data X is added, the second newly added entity data X may generate a relationship M with the entity data in the sub-graph M, and the second newly added entity data X may also generate a relationship N with the entity data in the sub-graph N. It should be noted that the second new entity data may also be a new subgraph.
Example embodiment one:
the target field can be a connected sub-graph identification field, and the data record table corresponding to the current incremental data is a pattern recognition result table, can be a pattern recognition entity table, and can also be a pattern recognition relation table. Illustratively, a unique identifier of current incremental data can be obtained, and whether a connected sub-graph identifier field in a mode identification result table where the unique identifier is located is empty or not is judged; and if the connected sub-graph identification field is empty, determining that the current incremental data is the first newly-increased data.
Specifically, when the current incremental data is newly added entity data, a unique identifier of the entity data, such as an id of the entity data, may be obtained. The method includes the steps that a mode identification entity table to which entity data belong can be inquired and obtained according to the id of the entity data, whether a componentId field in the mode identification entity table is empty or not is judged, if the componentId field is empty, it is indicated that connected subgraph identification to which the entity data belong is not recorded in the whole entity table, namely connected subgraph calculation is not carried out on the entity data in the history calculation process, and the entity data can be determined to be first newly-added entity data. At this time, the incrPattern field in the pattern recognition entity table may be updated to be "1.a", where 1.a indicates that the data pattern of the current incremental data is the first new incremental entity data.
When the current incremental data is the newly added relationship data, the unique identifier of the relationship data, such as the id of the relationship data, may be obtained. The method includes the steps of querying and obtaining a pattern recognition relation table to which the relation data belongs according to the id of the relation data, judging whether a componentId field in the pattern recognition relation table is empty, if the componentId field is empty, indicating that a connected subgraph identifier to which the relation data belongs is not recorded in a full-scale relation table, namely, if the relation data is not subjected to connected graph calculation in the history calculation process, determining that the relation data is first newly-added relation data. At this time, the incrPattern field in the pattern recognition relationship table may be updated to be "1.b", where 1.b indicates that the data pattern of the current incremental data is the first incremental relationship data.
Example embodiment two:
the target field can be a connected sub-graph identification field, a history attribute field and a current attribute field, and the data record table corresponding to the current incremental data is a pattern recognition result table, can be a pattern recognition entity table, and can also be a pattern recognition relation table. Exemplarily, a unique identifier of current incremental data can be obtained, whether a connected sub-graph identifier field in a mode identification result table where the unique identifier is located is empty or not is judged, and whether a current attribute field in the mode identification result table is the same as a historical attribute field or not is judged; and if the connected sub-graph identification field is not empty and the current attribute field is different from the historical attribute field, determining that the current incremental data is changed data.
Specifically, when the current incremental data is the newly added entity data, the unique identifier of the entity data, such as the id of the entity data, may be obtained. The method can query and acquire the pattern recognition entity table to which the entity data belongs according to the id of the entity data, determine whether the componentId field in the pattern recognition entity table is empty, and determine whether the propsMd5 field and the histPropsMd5 field in the pattern recognition entity table are the same. If the componentId field is not null, it indicates that the connected subgraph identifier to which the entity data belongs is recorded in the total entity table, that is, the connected subgraph calculation has been performed on the entity data in the history calculation process, that is, the entity data is not the new added data. Meanwhile, if the promsd 5 field and the histprpcmd 5 field in the pattern recognition entity table are different, which indicates that the entity data is changed, it may be determined that the entity data is changed entity data. At this time, the incrPattern field in the pattern recognition entity table may be updated to be "2.a", where 2.a indicates that the data pattern of the current incremental data is changed entity data.
When the current incremental data is newly added relationship data, a unique identifier of the relationship data, such as an id of the relationship data, may be obtained. The mode identification relation table to which the relation data belongs can be queried and acquired according to the id of the relation data, whether the componentId field in the mode identification relation table is empty or not is judged, and whether the promsd 5 field and the histpropsm 5 field in the mode identification relation table are the same or not is judged. If the componentId field is not null, it indicates that the connected subgraph identifier to which the relational data belongs is recorded in the total entity table, that is, the connected subgraph calculation has been performed on the relational data in the history calculation process, that is, the relational data is not newly added data. Meanwhile, if the promsd 5 field and the histprpcmd 5 field in the pattern recognition relationship table are different, which indicates that the relationship data is changed, the relationship data may be determined as changed relationship data. At this time, the incrPattern field in the pattern recognition relationship table may be updated to "2.B", where 2.B indicates that the data pattern of the current incremental data is the change relationship data.
Example embodiment three:
the first newly added relation data in the pattern recognition relation table can be inquired, the target entity data connected with the first newly added relation data is determined, and whether the connected sub-graph identification field corresponding to the target entity data in the full entity table is empty or not is judged. And if the connected sub-graph identification field is not empty, determining that the current incremental data is the second newly-added relation data.
Specifically, first augmented relationship data with an incrPattern field of "1.b" in the pattern recognition relationship table may be obtained, and the identifiers of the target entity data connected to the first augmented relationship data are srcId and dstId, respectively. Further, it may be determined whether a componentId field in the full entity table where the srcId is located is empty, and whether a componentId field in the full entity table where the dstId is located is empty, and if neither of the componentId fields is empty, it indicates that the entity data connected to the relationship data is not newly added data, it may be determined that the current incremental data is the second newly added relationship data.
The current increment is determined to be first newly-added relation data, and based on the fact that if the entity data connected with the relation data are historical data, after the current increment data are added, the relation is generated between the entity data in the originally isolated connected sub-graph, and the corresponding relation data are the current increment data, the current increment data can be determined to be second newly-added relation data. At this time, the incrPattern field in the pattern recognition relationship table may be updated to be "3", and 3 indicates that the data pattern of the current incremental data is the second new incremental relationship data.
In this example, if it is determined that neither the componentId field in the full entity table where the srcId is located nor the componentId field in the full entity table where the dstId is located is empty, the two componentId fields may be marked, so that subsequent connected subgraphs corresponding to the two componentId fields are processed.
Example embodiment four:
the first new entity data in the pattern recognition entity table can be queried, and the target relation data connected with the first new entity data in the pattern recognition relation table can be queried, wherein the target relation data is connected with the first new entity data and the corresponding historical entity data. Further, whether a connected sub-graph identification field corresponding to historical entity data in the full entity table is empty or not can be judged, and if the connected sub-graph identification field is not empty, the current incremental data is determined to be second new entity data.
Specifically, first new entity data with an incrPattern field of "1.a" in the pattern recognition entity table may be obtained, and target relationship data with the incrPattern field of "1.b" in the pattern recognition relationship table and connected to the first new entity data may be obtained, where the target relationship data may connect the first new entity data and the corresponding historical entity data. For example, the two entity data of the target relationship data connection are identified as srcId and dstId, respectively, where srcId may be the first added entity data and dstId may be the corresponding historical entity data, and dstId may be the first added entity data and srcId may be the corresponding historical entity data.
When the dstId is corresponding historical entity data, whether a componentId field in a full entity table where the dstId is located is empty or not can be judged, if the componentId field is not empty, the entity data characterized by the dstId is historical data, and after the current incremental data (namely the first newly added entity data srId) is added, a relation is generated between the current incremental data and the entity data in the historical connected subgraph, namely the current incremental data and the entity data characterized by the dstId are generated, so that the current incremental data can be determined to be second newly added entity data.
Similarly, when the srcId is corresponding historical entity data, it may be determined whether a componentId field in a full entity table where the srcId is located is empty, and if the componentId field is not empty, it indicates that the entity data characterized by the srcId is historical data, which indicates that after the current incremental data (i.e., the first newly added entity data dstId) is added, a relationship is generated between the current incremental data and the entity data in the historical connected sub-graph, i.e., a relationship is generated between the current incremental data and the entity data characterized by the srcId, and thus, it may also be determined that the current incremental data is second newly added entity data. At this time, the incrPattern field in the pattern recognition entity table may be updated to be "4", and 4 indicates that the data pattern of the current incremental data is the second new entity data.
In this example, when dstId is corresponding historical entity data, if it is determined that the componentId field in the full entity table where dstId is located is not empty, the componentId field may be marked, so as to process a connected subgraph corresponding to the componentId field subsequently. Or, when the srcId is the corresponding historical entity data, if it is determined that the component Id field in the full entity table where the srcId is located is not empty, marking the component Id field so as to process a connected subgraph corresponding to the component Id field subsequently.
The method and the device identify the data mode of the current incremental data so as to be convenient for carrying out different connected graph calculations on the current incremental data of different data modes subsequently, realize flexible processing on the incremental data of different data modes, and improve the data processing efficiency.
In step S230, a first connected sub-graph corresponding to the current incremental data is determined, and the first connected sub-graph is updated according to the data mode of the current incremental data, so as to obtain a second connected sub-graph.
The incremental data record table can be queried according to the unique identifier of the current incremental data to obtain a connected sub-graph identifier field corresponding to the unique identifier, and a first connected sub-graph corresponding to the current incremental data is determined according to the connected sub-graph identifier field.
When the current incremental data is determined to be the first newly-added data, the first newly-added entity data in the pattern recognition entity table and the first newly-added relation data in the pattern recognition relation table can be obtained, and a second connection graph is constructed according to the first newly-added entity data and the first newly-added relation data. For example, the entity data with the incrPattern field of 1.a in the pattern recognition entity table and the relationship data with the incrPattern field of 1.b in the pattern recognition relationship table may be acquired, a connected graph algorithm may be executed on all the acquired entity data and relationship data to obtain a second connected graph, and the componentId field in the full data record table where the entity data and the relationship data in the second connected graph are located may be updated to the componentId of the second connected graph. In addition, the componentId of the second connected subgraph corresponding to the first newly added data can be recorded in the set S1.
When the current incremental data is determined to be changed data, historical data in the first connected subgraph can be replaced by the changed data, and a second connected subgraph is obtained. Then, the entity data with the incrPattern field of 2.a in the pattern recognition entity table and the relation data with the incrPattern field of 2.b in the pattern recognition relation table can be obtained, the entity data in the full amount entity table is replaced by the entity data in the pattern recognition entity table according to the unique identification of the entity data, and the relation data in the full amount relation table is replaced by the relation data in the pattern recognition relation table according to the unique identification of the relation data. In addition, the componentId of the second connected subgraph corresponding to the changed data may be recorded in the set S2.
When it is determined that the current incremental data is the second newly added relationship data, multiple first connected subgraphs connected with the second newly added relationship data can be determined, and the multiple first connected subgraphs are merged to obtain a second connected subgraph. For example, the identifiers of the target entity data of the current incremental data connection are srcId and dstId, component ids corresponding to the srcId and the dstId may be determined, multiple first connected sub-graphs may be determined according to the component ids, and after the multiple first connected sub-graphs are merged to obtain a second connected sub-graph, the smallest component id in the multiple first connected sub-graphs may be selected as the identifier of the second connected sub-graph. Meanwhile, the componentId in the full-scale relational table where the current incremental data is located may be updated to the componentId of the second connected subgraph, the componentId in the full-scale physical table where the target entity data connected to the current incremental data is located may be updated to the componentId of the second connected subgraph, and the componentId in the pattern recognition relational table where the current incremental data is located may be updated to the componentId of the second connected subgraph.
When the current incremental data is determined to be the second newly added entity data, a target connection subgraph can be constructed according to the first newly added entity data and target relation data connected with the first newly added entity data, wherein the target relation data is connected with the first newly added entity data and historical entity data. The first connection subgraph corresponding to the historical entity data can be determined, and the target connection subgraph and the first connection subgraph are merged to obtain a second connection subgraph.
Illustratively, first newly-added entity data with an incrPattern field of '1.a' in a pattern recognition entity table can be obtained, target relation data which is connected with the first newly-added entity data and has an incrPattern field of '1.b' in a pattern recognition relation table can be obtained, and a connected graph algorithm is executed on the first newly-added entity data and the target relation data to obtain a target connected subgraph. When the identifiers of the first newly added entity data and the historical entity data connected by the target relational data are srcId and dstId, respectively, a componentId corresponding to the identifier dstId of the historical entity data can be determined, and a first connection sub-graph corresponding to the historical entity data is determined according to the componentId. And after the first connected subgraph and the target connected subgraph corresponding to the historical entity data are combined to obtain a second connected subgraph, selecting the smallest componentId in the first connected subgraph and the target connected subgraph as the identification of the second connected subgraph. Meanwhile, the componentId in the full entity table where the current incremental data is located may be updated to the componentId of the second connected subgraph, the componentId in the full entity table where the target relational data connected to the current incremental data is located may be updated to the componentId of the second connected subgraph, and the componentId in the pattern recognition entity table where the current incremental data is located may be updated to the componentId of the second connected subgraph.
In addition, when the current incremental data is the second new relationship/entity data, the componentId of the second connected sub-graph corresponding to the second new relationship/entity data may be recorded in the set S3. It should be noted that, after the second connected subgraph corresponding to the current incremental data is obtained, the full-size data record table may also be updated.
The method and the device for calculating the connection diagram based on the incremental data can reduce consumption of calculation resources, improve data processing efficiency, flexibly process the incremental data of different data modes, and further improve the data processing efficiency.
After a second connected subgraph corresponding to the current incremental data is obtained according to the steps S210 to S230, right penetration may be performed on the second connected subgraph to determine right data of each target entity in the second connected subgraph. It can be understood that, if the current incremental data is in a data mode, the right to stock penetration may be directly performed on the second connected subgraph corresponding to the current incremental data. If the current incremental data includes incremental data of multiple data modes, if the componentId of the second connected subgraph corresponding to the current incremental data is the set S = { S1, S2, S3}, the entity data and the relationship data corresponding to the componentId recorded in the set S can be obtained from the full-data recording table, and the stock right penetration calculation is performed according to the obtained entity data and the relationship data.
For example, grouping may be performed according to the componentIds, and the equity relationship in the connected subgraph corresponding to each componentId is calculated respectively. And successively iterating in the connected subgraph according to the stock holding proportion data of each entity to calculate the stock right data of each target entity. For example, the target entity is the sponsor A, the holding ratio of A to A is 100%, A holds 50% of the stock rights of B, A holds 30% of the stock rights of B, B holds 40% of the stock rights of C, and A holds 20% of the stock rights of C. The total stock holding ratio of a company a to C is 20% +30% + 40% +100% + 50% + 40% =52% by calculation, and it can be seen that a company a is the actual stock holder of the company C.
In the service scenario shown in the embodiment of the present disclosure, it is assumed that about 4300 ten thousand connected subgraphs can be obtained by performing connected graph calculation on historical full data, and about 100 ten thousand pieces of data are newly added every day, where entity data and relationship data included in the newly added data: 1) All exist in 4300 ten thousand connected subgraphs; 2) All the connected subgraphs do not exist in 4300 ten thousand connected subgraphs; 3) Part of the sub-images exists in 4300 ten thousand connected sub-images, and part of the sub-images does not exist in 4300 ten thousand connected sub-images; 4) Part of the sub-graphs exist in 4300 ten thousand connected sub-graphs, and part of the sub-graphs form connected sub-graph combination; 5) Part exists in 4300 ten thousand connected subgraphs, part does not exist in 4300 ten thousand connected subgraphs, and part constitutes connected subgraph merge. It can be seen that the amount of calculation is the largest in case 1). Moreover, if the entity data in 100 ten thousand pieces of data are all independent nodes, 100 ten thousand connected subgraphs can be constructed, and subsequently, only the 100 ten thousand connected subgraphs need to be subjected to stock right penetration. Compared with the stock right penetration of 4300 ten thousand connected subgraphs, the calculation amount is greatly reduced, a large amount of calculation resources are saved, the data processing efficiency is improved, and the optimization of the calculation of the connected subgraphs on the basis of incremental data on the stock right penetration application is finally realized.
It should be noted that the data processing method disclosed in the present disclosure may be applied to various scenarios involving a graph mining algorithm in services related to a knowledge graph, and the present disclosure is only described by taking a service scenario in which stock right penetration is calculated in the financial field as an example.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken into multiple step executions, etc.
Further, in the present exemplary embodiment, a data processing apparatus is also provided. The device can be applied to a terminal device or a server. Referring to fig. 7, the data processing apparatus 700 may include a connected subgraph construction module 710, a data pattern determination module 720, and a connected subgraph update module 730, wherein:
a connected subgraph construction module 710, configured to obtain current full-scale data of a target service, and construct a plurality of connected subgraphs according to the current full-scale data, where the current full-scale data at least includes current incremental data;
a data mode determining module 720, configured to identify a target field in the current incremental data based on a data record table corresponding to the current incremental data, so as to determine a data mode of the current incremental data;
and a connected subgraph updating module 730, configured to determine a first connected subgraph corresponding to the current incremental data, and update the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph.
In an alternative embodiment, the data pattern of the current incremental data in the data processing apparatus 700 includes at least any one of the following:
the current incremental data is first newly-added data, the first newly-added data does not influence the relationship between the connected subgraphs, and the first newly-added data comprises first newly-added entity data and first newly-added relationship data;
the current incremental data is changed data, the changed data does not influence the relation between the connected subgraphs, and the changed data comprises changed entity data and changed relation data;
the current incremental data are second newly added relation data, and the second newly added relation data enable the entity data in the connected subgraph to generate a relation;
and the current incremental data is second newly added entity data, and the second newly added entity data generates a relationship with the entity data in the connected subgraph.
In an alternative embodiment, the current full volume data further comprises historical full volume data; the data processing apparatus 700 further includes:
the system comprises a historical full data recording table establishing module, a full data recording table establishing module and a full data recording module, wherein the historical full data recording table comprises entity data and relation data;
an incremental data record table construction module, configured to construct an incremental data record table according to the entity data and the relationship data in the current incremental data, where the incremental data record table includes an incremental entity table and an incremental relationship table;
and the pattern recognition result table building module is used for building a pattern recognition result table according to the first field in the full data record table and the second field in the incremental data record table, and the pattern recognition result table comprises a pattern recognition entity table and a pattern recognition relation table.
In an optional embodiment, the target field is a connected sub-graph identification field; the data pattern determination module 720 includes:
the first target field judgment module is used for acquiring the unique identifier of the current incremental data and judging whether a connected sub-graph identifier field in a mode identification result table where the unique identifier is located is empty or not;
and the first data mode determining module is used for determining that the current incremental data is first newly-increased data if the connected sub-map identification field is empty.
In an optional implementation manner, the target fields are a connected sub-graph identification field, a history attribute field and a current attribute field; the data pattern determination module 720 includes:
the second target field judgment module is used for acquiring the unique identifier of the current incremental data, judging whether a connected sub-graph identifier field in a mode identification result table where the unique identifier is located is empty or not, and judging whether the current attribute field in the mode identification result table is the same as the historical attribute field or not;
and the second data mode determining module is used for determining that the current incremental data is changed data if the connected sub-graph identification field is not empty and the current attribute field is different from the historical attribute field.
In an alternative embodiment, the data pattern determination module 720 includes:
the target entity data determining module is used for inquiring first newly-added relation data in the pattern recognition relation table and determining target entity data connected with the first newly-added relation data;
a third target field judging module, configured to judge whether a connected sub-graph identification field corresponding to the target entity data in the full entity table is empty;
and the third data mode determining module is used for determining that the current incremental data is second newly-added relationship data if the connected sub-graph identification field is not empty.
In an alternative embodiment, the data pattern determination module 720 includes:
the new entity data determining module is used for inquiring the first new entity data in the pattern recognition entity table;
the target relation data determining module is used for inquiring target relation data connected with the first newly added entity data in a pattern recognition relation table, and the target relation data is connected with the first newly added entity data and historical entity data;
a fourth target field judgment module, configured to judge whether a connected sub-graph identifier field corresponding to the historical entity data in the full entity table is empty;
and the fourth data mode determining module is used for determining that the current incremental data is second newly added entity data if the connected sub-map identification field is not empty.
In an alternative embodiment, the current incremental data is first incremental data; the connected subgraph update module 730 includes:
the newly added data acquisition module is used for acquiring first newly added entity data in the pattern recognition entity table and first newly added relation data in the pattern recognition relation table;
and the second connected graph constructing module is used for constructing the second connected graph according to the first newly-added entity data and the first newly-added relation data.
In an optional embodiment, the current incremental data is change data; the connected subgraph updating module 730 is configured to replace the history data in the first connected subgraph with the changed data to obtain the second connected subgraph.
In an optional embodiment, the current incremental data is second newly added relation data; the connected subgraph update module 730 includes:
the first communication sub-graph determining module is used for determining a plurality of first communication sub-graphs connected with the second newly-added relationship data;
and the second connected subgraph generation module is used for combining a plurality of the first connected subgraphs to obtain the second connected subgraph.
In an optional embodiment, the current incremental data is second new entity data; the connected subgraph updating module 730 comprises:
the target connection subgraph construction module is used for constructing a target connection subgraph according to first newly-added entity data and target relation data connected with the first newly-added entity data, and the target relation data is connected with the first newly-added entity data and historical entity data;
the first communication sub-graph determining module is used for determining a first communication sub-graph corresponding to the historical entity data;
and the second connected subgraph generation module is used for merging the target connected subgraph and the first connected subgraph to obtain a second connected subgraph.
In an alternative embodiment, the data processing apparatus 700 further includes:
and the stock right penetrating module is used for performing stock right penetrating on the second connected subgraph so as to determine the stock right data of each target entity in the second connected subgraph.
The specific details of each module in the data processing apparatus have been described in detail in the corresponding data processing method, and therefore are not described herein again.
Further, in this example embodiment, a rights penetration device is also provided. The device can be applied to a terminal device or a server. Referring to fig. 8, the rights penetration device 800 may include a connected subgraph acquisition module 810 and a rights penetration module 820, wherein:
a connected subgraph obtaining module, configured to obtain a second connected subgraph in the data processing apparatus 700;
and the stock right data determining module is used for performing stock right penetration on the second connected subgraph to determine the stock right data of each target entity in the second connected subgraph.
Each module in the above apparatus may be a general-purpose processor, including: a central processing unit, a network processor, etc.; but may also be a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The modules may also be implemented in software, firmware, etc. The processors in the above device may be independent processors or may be integrated together.
Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing an electronic device to perform the steps according to various exemplary embodiments of the disclosure as described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the electronic device. The program product may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The exemplary embodiment of the present disclosure also provides an electronic device capable of implementing the above method. An electronic device 900 according to this exemplary embodiment of the present disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.
As shown in fig. 9, electronic device 900 may take the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: at least one processing unit 910, at least one memory unit 920, a bus 930 that connects the various system components (including the memory unit 920 and the processing unit 910), and a display unit 940.
The storage unit 920 stores program code, which may be executed by the processing unit 910, so that the processing unit 910 performs the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary method" section of this specification. For example, processing unit 910 may perform the method steps in fig. 2.
The storage unit 920 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 921 and/or a cache memory unit 922, and may further include a read only memory unit (ROM) 923.
The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 960. As shown in FIG. 9, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown in FIG. 9, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, to name a few.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present disclosure.
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed, for example, synchronously or asynchronously in multiple modules.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (16)
1.A method of data processing, comprising:
acquiring current full data of a target service, and constructing a plurality of connected subgraphs according to the current full data, wherein the current full data at least comprises current incremental data;
identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data;
and determining a first connected subgraph corresponding to the current incremental data, and updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph.
2. The data processing method of claim 1, wherein the data pattern of the current delta data comprises at least any one of:
the current incremental data is first incremental data, the first incremental data does not influence the relation between the connected subgraphs, and the first incremental data comprises first incremental entity data and first incremental relation data;
the current incremental data is changed data, the changed data does not influence the relation between the connected subgraphs, and the changed data comprises changed entity data and changed relation data;
the current incremental data is second added relation data, and the second added relation data enables the entity data in the connected subgraph to generate a relation;
and the current incremental data is second newly added entity data, and the second newly added entity data and the entity data in the connected subgraph generate a relationship.
3. The data processing method of claim 1, wherein the current full size data further comprises historical full size data; after obtaining the current full data of the target service, the method further comprises:
constructing a full data record table according to the entity data and the relation data in the historical full data, wherein the full data record table comprises a full entity table and a full relation table;
an incremental data record table is constructed according to the entity data and the relationship data in the current incremental data, and the incremental data record table comprises an incremental entity table and an incremental relationship table;
and constructing a pattern recognition result table according to the first field in the full data record table and the second field in the incremental data record table, wherein the pattern recognition result table comprises a pattern recognition entity table and a pattern recognition relation table.
4. The data processing method of claim 3, wherein the target field is a connected sub-graph identification field; the identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data includes:
acquiring a unique identifier of the current incremental data, and judging whether a connected sub-graph identifier field in a mode identification result table where the unique identifier is located is empty or not;
and if the connected sub-graph identification field is empty, determining that the current incremental data is first newly-increased data.
5. The data processing method of claim 3, wherein the target fields are a connected sub-graph identification field, a history attribute field, and a current attribute field; the identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data to determine a data mode of the current incremental data includes:
acquiring a unique identifier of the current incremental data, judging whether a connected sub-graph identifier field in a mode identification result table where the unique identifier is located is empty or not, and judging whether the current attribute field in the mode identification result table is the same as the historical attribute field or not;
and if the connected sub-graph identification field is not empty and the current attribute field is different from the historical attribute field, determining that the current incremental data is changed data.
6. The data processing method of claim 3, wherein the identifying a target field in the current incremental data to determine the data mode of the current incremental data based on a data record table corresponding to the current incremental data comprises:
inquiring first newly-added relation data in a pattern recognition relation table, and determining target entity data connected with the first newly-added relation data;
judging whether a connected sub-graph identification field corresponding to the target entity data in the full entity table is empty or not;
and if the connected sub-graph identification field is not empty, determining that the current incremental data is second newly-added relation data.
7. The data processing method of claim 3, wherein the identifying a target field in the current incremental data to determine the data mode of the current incremental data based on a data record table corresponding to the current incremental data comprises:
the query pattern identifies first newly added entity data in the entity table;
querying target relationship data connected with the first newly added entity data in a pattern recognition relationship table, wherein the target relationship data is connected with the first newly added entity data and historical entity data;
judging whether a connected sub-graph identification field corresponding to the historical entity data in the full entity table is empty or not;
and if the connected sub-graph identification field is not empty, determining that the current incremental data is second newly-added entity data.
8. The data processing method of any of claims 1-7, wherein the current delta data is a first new delta data; updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph, wherein the method comprises the following steps:
acquiring first newly-added entity data in a pattern recognition entity table and first newly-added relation data in a pattern recognition relation table;
and constructing the second connected graph according to the first newly-added entity data and the first newly-added relation data.
9. The data processing method of any of claims 1-7, wherein the current delta data is change data; updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph, wherein the method comprises the following steps:
and replacing the historical data in the first connected subgraph with the changed data to obtain a second connected subgraph.
10. The data processing method according to any one of claims 1 to 7, wherein the current incremental data is second newly added relational data; updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph, wherein the method comprises the following steps:
determining a plurality of first connection subgraphs connected with the second incremental relationship data;
and combining a plurality of first communication subgraphs to obtain the second communication subgraph.
11. The data processing method according to any one of claims 1 to 7, wherein the current incremental data is second newly added entity data; updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph, wherein the steps of:
constructing a target connection subgraph according to first newly-added entity data and target relation data connected with the first newly-added entity data, wherein the target relation data is connected with the first newly-added entity data and historical entity data;
determining a first connection sub-graph corresponding to the historical entity data;
and merging the target connected subgraph and the first connected subgraph to obtain the second connected subgraph.
12. A stock penetration method, comprising:
obtaining a second connected subgraph according to any of claims 1-11;
and carrying out stock right penetration on the second connected subgraph to determine the stock right data of each target entity in the second connected subgraph.
13. A data processing apparatus, comprising:
the communication subgraph construction module is used for acquiring current full-scale data of a target service and constructing a plurality of communication subgraphs according to the current full-scale data, wherein the current full-scale data at least comprises current incremental data;
the data mode determining module is used for identifying a target field in the current incremental data based on a data record table corresponding to the current incremental data so as to determine the data mode of the current incremental data;
and the connected subgraph updating module is used for determining a first connected subgraph corresponding to the current incremental data and updating the first connected subgraph according to the data mode of the current incremental data to obtain a second connected subgraph.
14. A rights penetration device, comprising:
a connected subgraph acquisition module for acquiring a second connected subgraph according to any one of claims 1 to 11;
and the stock right data determining module is used for performing stock right penetration on the second connected subgraph to determine the stock right data of each target entity in the second connected subgraph.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-12.
16. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1-12 via execution of the executable instructions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210981624.0A CN115328993A (en) | 2022-08-15 | 2022-08-15 | Data processing method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210981624.0A CN115328993A (en) | 2022-08-15 | 2022-08-15 | Data processing method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115328993A true CN115328993A (en) | 2022-11-11 |
Family
ID=83922949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210981624.0A Pending CN115328993A (en) | 2022-08-15 | 2022-08-15 | Data processing method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115328993A (en) |
-
2022
- 2022-08-15 CN CN202210981624.0A patent/CN115328993A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10769122B2 (en) | Specifying and applying logical validation rules to data | |
CN102945240B (en) | Method and device for realizing association rule mining algorithm supporting distributed computation | |
CN111709527A (en) | Operation and maintenance knowledge map library establishing method, device, equipment and storage medium | |
CN110472068A (en) | Big data processing method, equipment and medium based on heterogeneous distributed knowledge mapping | |
CN111813963A (en) | Knowledge graph construction method and device, electronic equipment and storage medium | |
CN113760891B (en) | Data table generation method, device, equipment and storage medium | |
CN108897874B (en) | Method and apparatus for processing data | |
CN106557307B (en) | Service data processing method and system | |
CN115827895A (en) | Vulnerability knowledge graph processing method, device, equipment and medium | |
CN110019116A (en) | Data traceability method, apparatus, data processing equipment and computer storage medium | |
CN114461644A (en) | Data acquisition method and device, electronic equipment and storage medium | |
CN112650909A (en) | Product display method and device, electronic equipment and storage medium | |
CN113076729A (en) | Method and system for importing report, readable storage medium and electronic equipment | |
US20240037084A1 (en) | Method and apparatus for storing data | |
US20200349128A1 (en) | Clustering within database data models | |
JP2023553220A (en) | Process mining for multi-instance processes | |
CN115878589A (en) | Version management method and device of structured data and related equipment | |
CN113687825A (en) | Software module construction method, device, equipment and storage medium | |
CN111046085B (en) | Data tracing processing method and device, medium and equipment | |
CN117390011A (en) | Report data processing method, device, computer equipment and storage medium | |
CN116955856A (en) | Information display method, device, electronic equipment and storage medium | |
CN113570464B (en) | Digital currency transaction community identification method, system, equipment and storage medium | |
CN115328993A (en) | Data processing method and device, storage medium and electronic equipment | |
CN116127154A (en) | Knowledge tag recommendation method and device, electronic equipment and storage medium | |
CN115114297A (en) | Data lightweight storage and search method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |