CN112527816B - Data blood relationship analysis method, system, computer equipment and storage medium - Google Patents

Data blood relationship analysis method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN112527816B
CN112527816B CN202011408452.5A CN202011408452A CN112527816B CN 112527816 B CN112527816 B CN 112527816B CN 202011408452 A CN202011408452 A CN 202011408452A CN 112527816 B CN112527816 B CN 112527816B
Authority
CN
China
Prior art keywords
log
structured query
value
statement
blood
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011408452.5A
Other languages
Chinese (zh)
Other versions
CN112527816A (en
Inventor
皮天远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011408452.5A priority Critical patent/CN112527816B/en
Publication of CN112527816A publication Critical patent/CN112527816A/en
Priority to PCT/CN2021/083128 priority patent/WO2022116425A1/en
Application granted granted Critical
Publication of CN112527816B publication Critical patent/CN112527816B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application belongs to the technical field of big data, is applied to the intelligent medical field, and relates to a data blood relationship analysis method, a system, computer equipment and a storage medium, wherein the method comprises the steps of synchronously receiving logs distributed by a log distribution end by a plurality of pre-deployed servers; the method comprises the steps of obtaining structured query scripts in a log, respectively encrypting structured query sentences in the structured query scripts based on a preset encryption algorithm to obtain sentence values, wherein sentence values obtained by the encryption algorithm are the same for the same structured query sentences; acquiring a historical statement value stored in a database, comparing whether the historical statement value which is the same as the statement value exists or not, and if the historical statement value which is the same as the statement value does not exist, analyzing the structured query script to obtain a blood-margin relation result; and storing the statement values into the database, and storing the blood relationship results into a preset result table. Wherein the result table may be stored in a blockchain. The method and the device can quickly determine the analysis condition of the structured query statement.

Description

Data blood relationship analysis method, system, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a method, a system, a computer device, and a storage medium for analyzing a data blood relationship.
Background
In the processes of data generation, processing, fusion, circulation and extinction, a relationship is formed between the data, and the relationship becomes a blood-edge relationship of the data. Along with the continuous development of computer technology, the data volume is continuously increased, the importance of the blood-edge relationship analysis among the data is also continuously highlighted, and the traceability of the data fusion processing is realized through the blood-edge analysis, so that all related metadata objects taking a certain data object as a starting point and the relationship among the metadata objects are found.
Currently, in the blood relationship analysis of a structured query sentence, a comparison is required between the analyzed structured query sentence and the structured query sentence to be judged, so as to determine whether the structured query sentence to be judged is analyzed. The computer is difficult to quickly judge whether the structural query statement has completed the analysis of the blood relationship, and the data processing speed of the computer is slower.
Disclosure of Invention
The embodiment of the application aims to provide a data blood relationship analysis method, a data blood relationship analysis device, computer equipment and a storage medium, which can quickly determine the analysis condition of a structured query statement.
In order to solve the above technical problems, the embodiments of the present application provide a data blood relationship analysis method, which adopts the following technical schemes:
a data blood relationship analysis method comprises the following steps:
a plurality of pre-deployed servers synchronously receive logs distributed by a log distribution terminal;
the server acquires a structured query script in the log, and encrypts structured query sentences in the structured query script through a preset encryption algorithm to obtain sentence values, wherein the sentence values obtained through the encryption algorithm are the same for the same structured query sentences;
the server acquires a historical sentence value stored in a database, compares whether the historical sentence value which is the same as the current sentence value exists or not, and if the historical sentence value which is the same as the current sentence value does not exist, analyzes a structured query sentence corresponding to the current sentence value to obtain a blood-margin relation result;
and the server stores the statement value into the database and stores the blood relationship result into a preset result table.
Further, before the step of synchronizing the multithreaded receive logs by the plurality of servers, the method further comprises:
the log distribution end identifies the number of logs currently processed by a plurality of pre-deployed servers;
the log distribution terminal obtains a plurality of logs and distributes the logs to different servers based on the log quantity.
Further, the log distributing end obtains a plurality of logs to be distributed, and the step of distributing the logs to be distributed to different servers based on the number of the logs comprises the following steps:
the log distribution end distributes logs to the server with the least log quantity in the current process one by one until the log distribution is completed or the log quantity in the current process of each server is equal;
and the log distribution end identifies whether unallocated logs exist or not, and when the unallocated logs exist, the unallocated logs are uniformly distributed to the different servers.
Further, after the step of storing the statement value in the database and storing the blood relationship result in a preset result table by the server, the method further includes:
the current server receives a first blood margin analysis completion signal carrying a first log mark;
the current server identifies whether the difference between the number of received first blood-margin analysis completion signals and the number of deployed servers is a digital one;
when the difference value is digital one, a current server acquires a log identifier carried by a log which is currently being processed, and when the current log identifier is different from the first log identifier, the current server carries out full-quantity duplicate removal operation on a blood margin relation result in the result table;
when the difference is not digital, the current server acquires a log identifier carried by the log currently being processed, and when the current log identifier is different from the first log identifier, a second blood-margin analysis completion signal carrying the first log identifier is sent to all servers.
Further, the step of encrypting the structured query sentence in the structured query script through a preset encryption algorithm to obtain a sentence value includes:
and encrypting the structured query statement in the structured query script respectively through an MD5 encryption algorithm to obtain the statement value.
Further, the step of analyzing the structured query sentence corresponding to the current sentence value to obtain the blood-edge relationship result includes:
performing preliminary blood margin analysis operation on the structured query statement to obtain an information grammar tree;
extracting a preset target, a source library of the target, a source table of the target and a source field of the target in the information grammar tree, and associating the target, the source library of the target, the source table of the target and the source field of the target to obtain the blood relationship result.
Further, after the step of comparing, by the server, whether the statement value is the same as the historical statement value in the database, the method further includes:
when the historical sentence value which is the same as the sentence value exists, determining the record time of the blood relationship with the historical sentence value, and modifying the record time into the current time.
In order to solve the above technical problems, the embodiment of the present application further provides a data blood relationship analysis system, which adopts the following technical scheme:
the data blood relationship analysis system comprises a plurality of servers and a log distribution end, wherein the servers comprise: the device comprises a receiving module, an encryption module, a comparison module and a storage module;
the receiving module is used for receiving the logs distributed by the log distribution end;
the encryption module is used for acquiring the structured query script in the log, and respectively encrypting the structured query sentences in the structured query script through a preset encryption algorithm to obtain sentence values, wherein the sentence values obtained through the encryption algorithm are the same for the same structured query sentences;
the comparison module is used for obtaining the historical sentence value stored in the database, comparing whether the historical sentence value which is the same as the current sentence value exists or not, and if the historical sentence value which is the same as the current sentence value does not exist, analyzing the structured query sentence corresponding to the current sentence value to obtain a blood-margin relation result;
the storage module is used for storing the statement values into the database and storing the blood-margin relation results into a preset result table.
In order to solve the above technical problems, the embodiments of the present application further provide a computer device, which adopts the following technical schemes:
a computer device comprising a memory and a processor, wherein computer readable instructions are stored in the memory, and the processor implements the steps of the data blood relationship analysis method described above when executing the computer readable instructions.
In order to solve the above technical problems, embodiments of the present application further provide a computer readable storage medium, which adopts the following technical solutions:
a computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the steps of the data blood relationship resolution method described above.
Compared with the prior art, the embodiment of the application has the following main beneficial effects: according to the method and the system, the logs distributed by the log distribution end are synchronously received through the plurality of pre-deployed servers, so that synchronous processing of the logs is facilitated, and accumulation of the logs is avoided. The server encrypts the structured query statement through a preset encryption algorithm to obtain a statement value; by comparing whether the statement value is the same with the historical statement value, whether the structural query statement corresponding to the current statement value has completed blood-lineage analysis or not is rapidly determined. And if the historical statement value which is the same as the statement value does not exist, analyzing the structured query script to obtain a blood relationship result. And obtaining complete blood relationship by directly analyzing the structured query statement. The statement value and the historical statement value are compared to realize rapid determination and analysis, so that repeated analysis of the structured query statement is avoided.
Drawings
For a clearer description of the solution in the present application, a brief description will be given below of the drawings that are needed in the description of the embodiments of the present application, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a data lineage resolution method according to the present application;
FIG. 3 is a schematic diagram of one embodiment of a data lineage resolution system according to the present application;
FIG. 4 is a schematic structural diagram of one embodiment of a computer device according to the present application.
Reference numerals: 200. a computer device; 201. a memory; 202. a processor; 203. a network interface; 300. a server; 301. a receiving module; 302. an encryption module; 303. a comparison module; 304. and a storage module.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.
The data blood relationship analysis method is applied to a data Xu Yuan relationship analysis system. As shown in fig. 1, the system architecture includes a server and a log distribution end. The different servers and the server and the log distribution end are all connected through a network, and the network can comprise various connection types, such as wired, wireless communication links or optical fiber cables.
The journal distribution terminal may be various electronic devices with a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
It should be understood that the number of servers and journal distribution terminals in fig. 1 is merely illustrative. There may be any number of servers and log distribution sides, as desired for implementation.
With continued reference to FIG. 2, a flow chart of one embodiment of a data lineage resolution method according to the present application is shown. The data blood relationship analysis method comprises the following steps:
s1: and the plurality of pre-deployed servers synchronously receive the logs distributed by the log distribution terminal.
In the embodiment, the method that the plurality of servers synchronously receive and process the logs can deal with a large quantity of logs, so that log accumulation is avoided, and rapid analysis of the logs is realized.
In this embodiment, the server may receive the log allocated by the log allocation end through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G connections, wiFi connections, bluetooth connections, wiMAX connections, zigbee connections, UWB (ultra wideband) connections, and other now known or later developed wireless connection means.
S2: the server acquires the structured query script in the log, and respectively encrypts structured query sentences in the structured query script through a preset encryption algorithm to obtain sentence values.
In this embodiment, the structured query statement is encrypted by an encryption algorithm to obtain a corresponding statement value, so that the subsequent comparison analysis is performed by the statement value.
Specifically, in step S2, that is, the step of encrypting the structured query statement in the structured query script by a preset encryption algorithm, to obtain a statement value includes:
and encrypting the structured query statement in the structured query script respectively through an MD5 encryption algorithm to obtain the statement value.
In this embodiment, the MD5 encryption Algorithm, i.e., the MD5 Message-Digest Algorithm (MD 5 Message-Digest Algorithm), is a widely used cryptographic Hash function, which can generate a 128-bit (16-byte) Hash Value (Hash Value) to ensure that the information transmission is completely consistent. And the MD5 encryption algorithm is simple and convenient to operate, can ensure the high-efficiency operation of the computer on data processing, is fixed for a fixed character string, and is suitable for the application.
It should be noted that, in practical application, other encryption algorithms may be selected to encrypt the structured query statement, so long as the encryption algorithm conforms to: the same structured query statement is encrypted, and the generated statement values are the same, so that the method and the device are applicable to the application.
S3: the server acquires the historical sentence value stored in the database, compares whether the historical sentence value which is the same as the current sentence value exists or not, and if the historical sentence value which is the same as the current sentence value does not exist, analyzes the structured query sentence corresponding to the current sentence value to obtain a blood-margin relation result.
In this embodiment, if the historical sentence value which is the same as the current sentence value does not exist, it can be quickly determined that the structured query sentence corresponding to the sentence value is not parsed, so that a parsing operation is performed on the structured query sentence, and a blood-edge relationship result corresponding to the structured query sentence is obtained.
Specifically, in step S3, the step of analyzing the structured query sentence corresponding to the current sentence value to obtain the blood-edge relationship result includes:
performing preliminary blood margin analysis operation on the structured query statement to obtain an information grammar tree;
extracting a preset target, a source library of the target, a source table of the target and a source field of the target in the information grammar tree, and associating the target, the source library of the target, the source table of the target and the source field of the target to obtain the blood relationship result.
In this embodiment, the information syntax tree is obtained by a preliminary blood-lineage parsing operation on the structured query statement. The information grammar tree based can directly extract a preset target, a source library of the target, a source list of the target, source fields of the target and association relations thereof, so that a blood relationship result is generated based on the association relations.
S4: and the server stores the statement value into the database and stores the blood relationship result into a preset result table.
In this embodiment, the statement values are stored in the database, expanding the historical statement values. And the blood relationship results are stored in a result table, so that the blood relationship results can be directly searched in the result table when needed.
After step S3, i.e. after the step of comparing, by the server, whether the statement value is the same as the historical statement value in the database, the method further comprises:
when the historical sentence value which is the same as the sentence value exists, determining the record time of the blood relationship with the historical sentence value, and modifying the record time into the current time.
In this embodiment, when the historical sentence value identical to the sentence value exists, it is determined that the structured query sentence corresponding to the sentence value has already been parsed, and repeated parsing is not required. But the record time of the blood relationship with the historical sentence value, which has a mapping relationship, needs to be modified as the current time, so that the subsequent searching and application are facilitated. The occurrence of the situation that the corresponding record time of the blood relationship is too long and is deleted in other cleaning operations is avoided.
After step S4, that is, after the step that the server stores the statement value in the database and stores the blood relationship result in a preset result table, the method further includes:
the current server receives a first blood margin analysis completion signal carrying a first log mark;
the current server identifies whether the difference between the number of received first blood-margin analysis completion signals and the number of deployed servers is a digital one;
when the difference value is digital one, a current server acquires a log identifier carried by a log which is currently being processed, and when the current log identifier is different from the first log identifier, the current server carries out full-quantity duplicate removal operation on a blood margin relation result in the result table;
when the difference is not digital, the current server acquires a log identifier carried by the log currently being processed, and when the current log identifier is different from the first log identifier, a second blood-margin analysis completion signal carrying the first log identifier is sent to all servers.
In this embodiment, the first log-mark-carrying first log-mark-analyzing completion signal is from another server. And when the logs distributed by the other servers on the current scene are analyzed, transmitting a first blood-margin analysis completion signal carrying a first log mark to the current server. When the current log identification is the same as the first log identification, determining that the analysis task of the log of the current scene is not completely allocated, and continuing to analyze the next log. After the plurality of servers respectively acquire log data, different servers respectively perform data processing through the data processing process. After all log processing of the current scene is finished by a certain server, a processing completion notification of the current scene log, namely a first blood-edge analysis completion signal, is sent to other servers at the same time, until the last server finishes data processing, the last server carries out full-scale deduplication operation on the blood-edge relation in the result table, and repeated deduplication operation of the server is avoided. Specifically, the judgment of whether the server analyzes the log of the current scene is through the log identification, all logs in the first scene carry the first log identification, and all logs in the second scene carry the second log identification. The server synchronously receives and parses all logs in the first scenario. When the server recognizes that the log mark being analyzed is not the first log mark, namely, the log mark is expressed to finish the operation of analyzing the blood edges of all logs in the first scene, a second blood edge analysis completion signal carrying the first log mark is sent to other servers, so that the other servers can know the progress of each server. When the difference is digital, determining that other servers have completed the blood-edge analysis operation on the distributed logs in the first scene, acquiring a log identifier carried by the log currently being processed by the current server, and determining that the blood-edge analysis on all the logs in the first scene is completed when the current log identifier is different from the first log identifier by the current server. Then the current server may perform a full-scale deduplication operation on the blood-lineage results in the results table.
Before step S1, that is, before the step of synchronizing the multithreaded reception logs by the plurality of servers, the method further includes:
the log distribution end identifies the number of logs currently processed by a plurality of pre-deployed servers;
the log distribution terminal obtains a plurality of logs and distributes the logs to different servers based on the log quantity.
In this embodiment, the log distribution terminal distributes the logs according to the number of logs being processed by different servers. The consistency of the processing progress of different servers is ensured.
Specifically, the step of the log distribution end obtaining a plurality of logs to be distributed and distributing the logs to be distributed to different servers based on the number of the logs includes:
the log distribution end distributes logs to the server with the least log quantity in the current process one by one until the log distribution is completed or the log quantity in the current process of each server is equal;
and the log distribution end identifies whether unallocated logs exist or not, and when the unallocated logs exist, the unallocated logs are uniformly distributed to the different servers.
In this embodiment, the present application provides a plurality of servers for synchronous processing to acquire log data. And when the log distribution terminal performs log distribution, the log distribution terminal performs distribution based on the number of the logs which are being processed by the current server. For example, the number of the logs of the data currently being processed by the server is x, y and z, wherein x is greater than or equal to y is greater than or equal to z, and then the log distribution end distributes the logs to the server with the least number of the logs currently being processed one by one until the log distribution is completed or the number of the logs currently being processed by each server is equal, that is, x=y=z, and then the log distribution end distributes the remaining logs to different servers uniformly, so that the processing progress among the plurality of servers is ensured to be similar, and the processing speed of the data is ensured.
According to the method and the system, the logs distributed by the log distribution end are synchronously received through the plurality of pre-deployed servers, so that synchronous processing of the logs is facilitated, and accumulation of the logs is avoided. The server encrypts the structured query statement through a preset encryption algorithm to obtain a statement value; by comparing whether the statement value is the same with the historical statement value, whether the structural query statement corresponding to the current statement value has completed blood-lineage analysis or not is rapidly determined. And if the historical statement value which is the same as the statement value does not exist, analyzing the structured query script to obtain a blood relationship result. And obtaining complete blood relationship by directly analyzing the structured query statement. The statement value and the historical statement value are compared to realize rapid determination and analysis, so that repeated analysis of the structured query statement is avoided.
It is emphasized that to further guarantee the privacy and security of the result table, the result table may also be stored in a blockchain node.
The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The method and the device can be applied to the intelligent medical field and used for blood margin analysis of medical data, so that construction of intelligent cities is promoted.
Those skilled in the art will appreciate that implementing all or part of the processes of the methods of the embodiments described above may be accomplished by way of computer readable instructions, stored on a computer readable storage medium, which when executed may comprise processes of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a data blood relationship analysis system, where an embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 3, the data blood relationship analysis system in this embodiment includes a plurality of servers and a log distribution end: wherein the server 300 includes: a receiving module 301, an encrypting module 302, a comparing module 303 and a storing module 304;
the receiving module 301 is configured to receive a log allocated by the log allocation end;
the encryption module 302 is configured to obtain a structured query script in the log, and encrypt, by using a preset encryption algorithm, structured query sentences in the structured query script to obtain sentence values, where sentence values obtained by using the same structured query sentences are the same;
the comparison module 303 is configured to obtain a historical sentence value stored in the database, compare whether there is a historical sentence value identical to the current sentence value, and if there is no historical sentence value identical to the current sentence value, analyze a structured query sentence corresponding to the current sentence value, so as to obtain a blood-edge relationship result;
the storage module 304 is configured to store the statement value in the database, and store the blood-edge relationship result in a preset result table.
In this embodiment, the logs distributed by the log distribution end are synchronously received by a plurality of pre-deployed servers, which is favorable for synchronously processing the logs and avoiding accumulation of the logs. The server encrypts the structured query statement through a preset encryption algorithm to obtain a statement value; by comparing whether the statement value is the same with the historical statement value, whether the structural query statement corresponding to the current statement value has completed blood-lineage analysis or not is rapidly determined. And if the historical statement value which is the same as the statement value does not exist, analyzing the structured query script to obtain a blood relationship result. And obtaining complete blood relationship by directly analyzing the structured query statement. The statement value and the historical statement value are compared to realize rapid determination and analysis, so that repeated analysis of the structured query statement is avoided.
In some optional implementations of this embodiment, the encryption module 302 is further configured to: and encrypting the structured query statement in the structured query script respectively through an MD5 encryption algorithm to obtain the statement value.
The comparison module 303 includes an parsing sub-module and an extraction sub-module. The analysis submodule is used for carrying out preliminary blood-margin analysis operation on the structured query statement to obtain an information grammar tree; the extraction submodule is used for extracting a preset target, a source library of the target, a source table of the target and a source field of the target in the information grammar tree, and associating the target, the source library of the target, the source table of the target and the source field of the target to obtain the blood-margin relation result.
In some optional implementations of this embodiment, the server 300 further includes: and the time modification module is used for determining the record time of the blood relationship with the historical sentence value and having a mapping relationship with the historical sentence value when the historical sentence value which is the same as the sentence value exists, and modifying the record time into the current time.
In some optional implementations of this embodiment, the server 300 further includes: the identification module comprises a signal receiving sub-module, an identification sub-module, a de-duplication sub-module and a signal sending sub-module. The signal receiving submodule is used for receiving a first blood margin analysis completion signal carrying a first log mark; the identifying submodule is used for identifying whether the difference value between the number of the received first blood-margin analysis completion signals and the number of the deployed servers is a digital one or not; the duplicate removal sub-module is used for acquiring a log identifier carried by a log currently being processed when the difference value is digital one, and performing full duplicate removal operation on a blood margin relation result in the result table by the current server when the current log identifier is different from the first log identifier; the signal sending sub-module is used for obtaining the log mark carried by the log which is currently being processed when the difference value is not digital one, and sending a second blood margin analysis completion signal carrying the first log mark to all servers when the current log mark is different from the first log mark.
The log distribution end is used for identifying the number of logs currently processed by a plurality of pre-deployed servers, acquiring a plurality of logs and distributing the logs to different servers based on the number of the logs.
The log distribution end comprises a distribution module and an equipartition module. The distribution module is used for distributing the logs to the server with the least log quantity in the current process one by the log distribution end until the log distribution is completed or the log quantity in the current process of each server is equal; the log distribution module is used for identifying whether unallocated logs exist at the log distribution end, and when the unallocated logs exist, the unallocated logs are uniformly distributed to different servers.
According to the method and the system, the logs distributed by the log distribution end are synchronously received through the plurality of pre-deployed servers, so that synchronous processing of the logs is facilitated, and accumulation of the logs is avoided. The server encrypts the structured query statement through a preset encryption algorithm to obtain a statement value; by comparing whether the statement value is the same with the historical statement value, whether the structural query statement corresponding to the current statement value has completed blood-lineage analysis or not is rapidly determined. And if the historical statement value which is the same as the statement value does not exist, analyzing the structured query script to obtain a blood relationship result. And obtaining complete blood relationship by directly analyzing the structured query statement. The statement value and the historical statement value are compared to realize rapid determination and analysis, so that repeated analysis of the structured query statement is avoided.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 200 includes a memory 201, a processor 202, and a network interface 203 communicatively coupled to each other via a system bus. It should be noted that only computer device 200 having components 201-203 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 201 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 200. Of course, the memory 201 may also include both internal storage units of the computer device 200 and external storage devices. In this embodiment, the memory 201 is generally used to store an operating system and various application software installed on the computer device 200, such as computer readable instructions of a data blood relationship analysis method. In addition, the memory 201 may be used to temporarily store various types of data that have been output or are to be output.
The processor 202 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 202 is generally used to control the overall operation of the computer device 200. In this embodiment, the processor 202 is configured to execute computer readable instructions stored in the memory 201 or process data, such as computer readable instructions for executing the data blood relationship analysis method.
The network interface 203 may comprise a wireless network interface or a wired network interface, which network interface 203 is typically used to establish communication connections between the computer device 200 and other electronic devices.
In the embodiment, the statement value is obtained through the encryption algorithm, and the statement value and the historical statement value are directly compared to realize rapid determination and analysis conditions, so that repeated analysis of the structured query statement is avoided.
The present application also provides another embodiment, namely, a computer-readable storage medium, where computer-readable instructions are stored, where the computer-readable instructions are executable by at least one processor to cause the at least one processor to perform the steps of the data blood-edge relationship analysis method as described above.
In the embodiment, the statement value is obtained through the encryption algorithm, and the statement value and the historical statement value are directly compared to realize rapid determination and analysis conditions, so that repeated analysis of the structured query statement is avoided.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims (6)

1. The data blood relationship analysis method is characterized by comprising the following steps of:
the log distribution end identifies the number of logs currently processed by a plurality of pre-deployed servers;
the log distribution end distributes logs to the server with the least log quantity in the current process one by one until the log distribution is completed or the log quantity in the current process of each server is equal;
the method comprises the steps that a log distribution end identifies whether unallocated logs exist or not, and when the unallocated logs exist, the unallocated logs are uniformly distributed to different servers;
a plurality of pre-deployed servers synchronously receive logs distributed by a log distribution terminal;
the server acquires a structured query script in the log, and encrypts structured query sentences in the structured query script through a preset encryption algorithm to obtain sentence values, wherein the sentence values obtained through the encryption algorithm are the same for the same structured query sentences;
the server acquires a historical sentence value stored in a database, compares whether the historical sentence value which is the same as the current sentence value exists or not, and if the historical sentence value which is the same as the current sentence value does not exist, performs preliminary blood-margin analysis operation on the structured query sentence to obtain an information grammar tree;
extracting a preset target, a source library of the target, a source table of the target and a source field of the target in the information grammar tree, and associating the target, the source library of the target, the source table of the target and the source field of the target to obtain a blood relationship result;
the server stores the statement values into the database, and stores the blood-margin relation results into a preset result table;
the current server receives a first blood margin analysis completion signal carrying a first log mark;
the current server identifies whether the difference between the number of received first blood-margin analysis completion signals and the number of deployed servers is a digital one;
when the difference value is digital one, a current server acquires a log identifier carried by a log which is currently being processed, and when the current log identifier is different from the first log identifier, the current server carries out full-quantity duplicate removal operation on a blood margin relation result in the result table;
when the difference is not digital, the current server acquires a log identifier carried by the log currently being processed, and when the current log identifier is different from the first log identifier, a second blood-margin analysis completion signal carrying the first log identifier is sent to all servers.
2. The data blood relationship analysis method according to claim 1, wherein the step of encrypting the structured query statement in the structured query script by a preset encryption algorithm, respectively, to obtain the statement value comprises:
and encrypting the structured query statement in the structured query script respectively through an MD5 encryption algorithm to obtain the statement value.
3. The data blood relationship analysis method according to claim 1, further comprising, after the step of comparing, by the server, whether the statement value is identical to a historical statement value in a database:
when the historical sentence value which is the same as the sentence value exists, determining the record time of the blood relationship with the historical sentence value, and modifying the record time into the current time.
4. A data blood relationship resolution system for performing the steps of the data blood relationship resolution method of any one of claims 1 to 3.
5. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the data blood relationship resolution method of any one of claims 1 to 3.
6. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the data blood relationship resolution method of any one of claims 1 to 3.
CN202011408452.5A 2020-12-03 2020-12-03 Data blood relationship analysis method, system, computer equipment and storage medium Active CN112527816B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011408452.5A CN112527816B (en) 2020-12-03 2020-12-03 Data blood relationship analysis method, system, computer equipment and storage medium
PCT/CN2021/083128 WO2022116425A1 (en) 2020-12-03 2021-03-26 Method and system for data lineage analysis, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011408452.5A CN112527816B (en) 2020-12-03 2020-12-03 Data blood relationship analysis method, system, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112527816A CN112527816A (en) 2021-03-19
CN112527816B true CN112527816B (en) 2023-06-02

Family

ID=74997228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011408452.5A Active CN112527816B (en) 2020-12-03 2020-12-03 Data blood relationship analysis method, system, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112527816B (en)
WO (1) WO2022116425A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527816B (en) * 2020-12-03 2023-06-02 平安科技(深圳)有限公司 Data blood relationship analysis method, system, computer equipment and storage medium
CN113064869B (en) * 2021-03-23 2023-06-13 网易(杭州)网络有限公司 Log processing method, device, transmitting end, receiving end equipment and storage medium
CN114253995B (en) * 2022-03-01 2022-05-27 深圳市明源云科技有限公司 Data tracing method, device, equipment and computer readable storage medium
CN115827677A (en) * 2023-01-10 2023-03-21 北京沐融信息科技股份有限公司 Database operation method and device and storage medium
CN116628451B (en) * 2023-05-31 2023-11-14 江苏华存电子科技有限公司 High-speed analysis method for information to be processed
CN116484084B (en) * 2023-06-21 2023-11-17 广州信安数据有限公司 Metadata blood-margin analysis method, medium and system based on application information mining
CN116541887B (en) * 2023-07-07 2023-09-15 云启智慧科技有限公司 Data security protection method for big data platform
CN116662308B (en) * 2023-07-28 2023-11-03 恩核(北京)信息技术有限公司 Blood margin data extraction method based on several bins of log files
CN116932656B (en) * 2023-09-18 2024-01-09 中孚安全技术有限公司 Data blood edge storage method, system, equipment and medium based on block chain
CN117312331B (en) * 2023-12-01 2024-03-29 浪潮云信息技术股份公司 Metadata blood-edge analysis method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN110298001A (en) * 2019-05-30 2019-10-01 北京奇艺世纪科技有限公司 The acquisition methods and device and computer readable storage medium of daily record data packet
CN111459967A (en) * 2020-03-03 2020-07-28 深圳壹账通智能科技有限公司 Structured query statement generation method and device, electronic equipment and medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160255139A1 (en) * 2016-03-12 2016-09-01 Yogesh Chunilal Rathod Structured updated status, requests, user data & programming based presenting & accessing of connections or connectable users or entities and/or link(s)
US10114859B2 (en) * 2015-11-19 2018-10-30 Sap Se Extensions of structured query language for database-native support of graph data
CN110781520A (en) * 2019-10-30 2020-02-11 上海观安信息技术股份有限公司 Sensitive table group discovery method and system
CN111666326B (en) * 2020-05-29 2023-03-14 中国工商银行股份有限公司 ETL scheduling method and device
CN112015722A (en) * 2020-11-02 2020-12-01 浙江大华技术股份有限公司 Database management method, data blood relationship analysis method and related device
CN112527816B (en) * 2020-12-03 2023-06-02 平安科技(深圳)有限公司 Data blood relationship analysis method, system, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN110298001A (en) * 2019-05-30 2019-10-01 北京奇艺世纪科技有限公司 The acquisition methods and device and computer readable storage medium of daily record data packet
CN111459967A (en) * 2020-03-03 2020-07-28 深圳壹账通智能科技有限公司 Structured query statement generation method and device, electronic equipment and medium

Also Published As

Publication number Publication date
WO2022116425A1 (en) 2022-06-09
CN112527816A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN112527816B (en) Data blood relationship analysis method, system, computer equipment and storage medium
WO2021164178A1 (en) Cloud technology-based file fragment uploading method and apparatus, and device and storage medium
US9111081B2 (en) Remote direct memory access authentication of a device
WO2019019361A1 (en) Method and apparatus for processing data of database, computer device, and storage medium
CN109657107B (en) Terminal matching method and device based on third-party application
CN110866491A (en) Target retrieval method, device, computer readable storage medium and computer equipment
CN110795499A (en) Cluster data synchronization method, device and equipment based on big data and storage medium
JP2019145093A (en) Method and apparatus for generating information
CN113613061B (en) Checkpoint template generation method, checkpoint template generation device, checkpoint template generation equipment and storage medium
CN112860662B (en) Automatic production data blood relationship establishment method, device, computer equipment and storage medium
CN113010542B (en) Service data processing method, device, computer equipment and storage medium
CN112468409A (en) Access control method, device, computer equipment and storage medium
WO2017107679A1 (en) Historical information display method and apparatus
CN114996675A (en) Data query method and device, computer equipment and storage medium
CN112436943B (en) Request deduplication method, device, equipment and storage medium based on big data
WO2022126962A1 (en) Knowledge graph-based method for detecting guiding and abetting corpus and related device
CN117251228A (en) Function management method, device, computer equipment and storage medium
CN112363814A (en) Task scheduling method and device, computer equipment and storage medium
CN115455020A (en) Incremental data synchronization method and device, computer equipment and storage medium
CN112527802B (en) Soft link method and device based on key value database
CN114912003A (en) Document searching method and device, computer equipment and storage medium
CN114615325A (en) Message pushing method and device, computer equipment and storage medium
CN109710852A (en) It is a kind of for determining the method and apparatus of the label information of financial information
CN115567455B (en) Access flow switching method and device, computer equipment and storage medium
CN108536362B (en) Method and device for identifying operation and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant