CN115687309A - Non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction method and device - Google Patents

Non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction method and device Download PDF

Info

Publication number
CN115687309A
CN115687309A CN202211717745.0A CN202211717745A CN115687309A CN 115687309 A CN115687309 A CN 115687309A CN 202211717745 A CN202211717745 A CN 202211717745A CN 115687309 A CN115687309 A CN 115687309A
Authority
CN
China
Prior art keywords
data
warehouse
cigarette
relationship
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211717745.0A
Other languages
Chinese (zh)
Other versions
CN115687309B (en
Inventor
潘晓华
金泳
高扬华
沈诗婧
朱心洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
China Tobacco Zhejiang Industrial Co Ltd
Original Assignee
Zhejiang University ZJU
China Tobacco Zhejiang Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, China Tobacco Zhejiang Industrial Co Ltd filed Critical Zhejiang University ZJU
Priority to CN202211717745.0A priority Critical patent/CN115687309B/en
Publication of CN115687309A publication Critical patent/CN115687309A/en
Application granted granted Critical
Publication of CN115687309B publication Critical patent/CN115687309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention mainly relates to a non-invasive cigarette warehouse-in and warehouse-out full-process data blood relationship construction method and a non-invasive cigarette warehouse-out full-process data blood relationship construction device, wherein the method comprises the following steps: obtaining a keyword stream of an SQL expression Q, obtaining an abstract syntax tree corresponding to the Q according to the keyword stream and the Bax paradigm, obtaining a data relation in the Q through the abstract syntax tree, then obtaining a corresponding relation between a data field and concrete data and a corresponding relation between the data field and a data type, finally defining a data node and a data relation, and outputting a data blood margin map. The method for constructing the whole-process data blood relationship of the cigarette warehouse-in and warehouse-out is complete, correct and non-invasive, is simple to implement, does not need higher safety authority, does not have potential safety hazard, does not influence the existing data storage, can solve the problems of complex data, difficult data management and control, slow positioning and the like in the cigarette logistics circulation process, realizes high-efficiency analysis, management and control, tracing and auditing of the whole-process data of the cigarette warehouse-in and warehouse-out, and improves the management and control capability, management and analysis efficiency of the whole-process data of the cigarette warehouse-in and warehouse-out.

Description

Non-invasive cigarette warehouse-in and warehouse-out full-process data blood relationship construction method and device
Technical Field
The invention mainly relates to the technical field of cigarette warehouse-in and warehouse-out full-process data processing, in particular to a non-invasive cigarette warehouse-out full-process data consanguinity construction method and device.
Background
The data blooding margin (also called a data lineage) is metadata which describes process information of data generated and evolving along with time and a relation between the data and the data, and influence analysis can be performed downwards or traceability analysis can be performed upwards aiming at the data through a data circulation process of the data blooding margin recorded data so as to solve the problems that the data is difficult to control and position in the cigarette logistics circulation process. In recent years, with the development of big data, the construction of data blood relationship and related applications are receiving much attention from related researchers; for example, bates et al propose Linux Provennce Modules, which intercept system call information by designing Hook function or in kernel layer of Linux system, and obtain and analyze related data consanguinity information from the system call information. The method constructs the data blooding margin at the level of an operating system, and the method is often higher in safety sensitivity and limited by an operating system kernel at the level of the system. To solve the problem, the Alkholdi et al designs a Hook function for tracking the blood relationship of the data, which is created in the Cassandra database, so as to monitor all operations in the database, extract and analyze operations related to data circulation, and construct the blood relationship of the data; chacko et al propose a method for data lineage construction based on a document database operation log, to construct data lineage in a Mongobb database; koelreuteric proposes a method for constructing test data of an aircraft based on a knowledge graph and processing data consanguinity, and introduces the data consanguinity to improve the efficiency of data management and analysis in an aircraft experiment.
In the process of automatic operation of the cigarette logistics system, a large amount of data can be generated, such as cigarette related entity data of cigarette batches, types, quantities, names and the like, and report data of warehouse entry and exit reports, transportation reports and the like, or control data consisting of process data generated in the processes of loading, transportation, warehouse arrival, upper level operation, terminal operation, interface operation and the like, and various rule data (such as rule data when classifying the quality types of cigarettes) related to warehouse entry and exit of the cigarettes. The composition of the data is complex, and a cross-correlation relationship exists. If the data recording of a certain logistics circulation process is wrong, the whole logistics data can be wrong. The automatic operation mode of the cigarette warehouse-in and warehouse-out has the problems of difficult data control, difficult abnormal data link positioning and the like.
Although the existing related method can obtain a good effect in constructing a file and data table level data margin, in a cigarette warehousing and ex-warehousing scene, the related data volume is large, massive data is stored in a relational database, and data is generated and circulated along with the cigarette goods warehousing and ex-warehousing process, so that the data is related to other data, the data are complex, the data can run compatible with the existing database when the data margin is constructed, the existing relational database-based method for storing and managing the cigarette warehousing and ex-warehousing data is limited by external key constraints, two or more tables are connected and quoted, and when the data is searched and matched through the external key, more system resources are consumed, and corresponding requirements cannot be responded in time. The efficiency of auditing, managing, tracing and analyzing the data in the process of warehousing and ex-warehousing the cigarettes is influenced.
The foregoing background information is provided to assist those skilled in the art in understanding the prior art which is closer to the present invention and to facilitate an understanding of the inventive concepts and technical solutions of the present application, and it should be understood that the above background information should not be used to assess the novelty of the technical solutions of the present application without explicit evidence that the above matter has been disclosed at the filing date of the present patent application.
Disclosure of Invention
In order to solve at least one technical problem mentioned in the background art, the invention aims to provide a method for constructing the blood margin of the whole process data of the warehouse-in and warehouse-out of cigarettes in a non-invasive manner, which is compatible with the existing application, is complete, correct and non-invasive, is simple to implement, does not need higher security authority, does not bring about potential safety hazards, does not influence the existing data storage mode, can solve the problems of complex data, difficult data management and control, slow positioning and the like in the cigarette logistics circulation process, realizes the high-efficiency analysis, management and control, tracing and audit of the whole process data of the warehouse-in and warehouse-out of the cigarettes, and improves the management and control capability, management and analysis efficiency of the whole process data of the warehouse-in and warehouse-out of the cigarettes.
The non-invasive cigarette warehousing and ex-warehouse full-process data blood relationship construction method comprises the following steps:
data acquisition and analysis, namely monitoring the transaction of a relational database storing cigarette warehousing-in and warehousing-out related data through a Hook function at a database level, and acquiring related time sequence information, related data and a correspondingly executed SQL expression and storing the related time sequence information, the related data and the correspondingly executed SQL expression in a data table when the change operation is monitored;
analyzing the data relation, analyzing the relevant time sequence information, the relevant data and the SQL expression Q executed correspondingly, identifying data nodes, and extracting and expressing the dependency relationship among the data nodes in a triple form; storing the data acquisition result and the corresponding specific data field; storing the data and the classification result corresponding to the data;
constructing a data blood margin, namely constructing the blood margin of the cigarette warehouse-in and warehouse-out full-flow data according to a combing result of the cigarette warehouse-in and warehouse-out full-flow data and an analysis and binding result of the relationship between the data nodes and the data nodes;
and (3) storing the data blood margin, namely storing the data blood margin in a graph database in a directed acyclic graph mode, and describing the relationship between the data nodes and the data nodes in the cigarette warehousing and ex-warehousing process by using the graph.
In some embodiments, the altering operation in the data collecting and analyzing step includes at least one of a query operation, an insert operation, and an update operation.
In some specific embodiments, in the data relationship analyzing step, analyzing the relevant timing information, the relevant data, and the corresponding executed SQL expression Q performs the following steps:
(1) And (3) analyzing an SQL expression Q: converting an input SQL expression Q into a keyword stream, traversing the keyword stream according to a grammar rule, converting the keyword stream into an abstract grammar tree, traversing the abstract grammar tree, identifying defined data nodes, extracting the dependency relationship between the data nodes, and expressing the dependency relationship between the data nodes in a triple form;
(2) Binding of data: when data is collected, relevant specific data when the SQL expression Q is executed is monitored and collected at the same time; the data acquisition result and the corresponding specific data field are stored in a data dictionary I, and the hash dictionary stores the relation between the data field and the specific data in a Key-Value Key Value pair mode; and storing the data and the corresponding classification result in a classification dictionary M according to the data carding result of the whole process of the cigarette warehousing-out, wherein I and M are Hash dictionaries.
In some embodiments, the parsing of the SQL expression Q performs the following steps:
for the SQL expression Q, it is sliced by characters, i.e., such that Q = { c = { c } 1 ,c 2 ,c 3 ,…,c i In which c is i The ith component character in the SQL expression Q;
according to the lexical rule of the SQL expression, a deterministic finite automaton D is constructed:
f=D(S,Q,δ,c 1 ) (1)
in the formula (1), S is a finite state set defined according to a lexical rule of an SQL expression Q, delta is a state conversion function in a deterministic finite automaton D, and f is a keyword stream obtained after word segmentation;
converting Q into a keyword stream f by a deterministic finite automaton D to obtain f = { C 1, C 2 ,C 3 ,…,C i },C i The method comprises the steps of obtaining an ith keyword in an SQL expression Q; after the list of keywords is obtained, a recursive function G is constructed:
T=G(f,grammar) (2)
in the formula (2), the grammar is the Bax paradigm corresponding to the SQL expression Q, and the keywords are recursively converted through the recursive function GConverting the stream f into an abstract syntax tree T to obtain T = (f, R), wherein R is the connection relation between the keywords and the keywords in the keyword stream f, and R = { (C) 1 ,C i ),(C j ,C k ),…};
For the abstract syntax tree T, data nodes related in the SQL expression Q are distributed on leaf nodes of the abstract syntax tree T, and corresponding father nodes contain concrete semantic relations;
defining a function P to traverse the abstract syntax tree T from top to bottom to obtain the relationship between data in the SQL expression Q, wherein:
RD=P(T) (3)
in formula (3), RD is the relationship between data and data in the SQL expression Q obtained from the abstract syntax tree T by the function P, RD ∈ (E, L, S), where E = { datanode = 1 ,datanode 2 ,…,datanode n E is a set of data nodes related to the whole flow of cigarette warehouse entry and exit, and dataode is an instantiation representation of the data nodes; l = { L 1 ,l 2 ,…,l r L is the relationship between the data node and the data nodel r A set of (a); s \8838, E × L × E, representing a set of triples describing relationships between data nodes; and after the RD is obtained, binding the RD with the recorded specific data.
In some specific embodiments, in the step (2) of binding the data, the result of data acquisition and the corresponding specific data field are stored in a data dictionary I, and a hash dictionary stores the relationship between the data field and the specific data thereof in the form of a Key-Value Key Value pair; obtaining a corresponding relation U between the data nodes and specific data thereof through a function H:
U=H(datanode,I),datanode∈E (4)。
in some specific embodiments, in the step (2) of binding the data, the data and the classification result corresponding to the data are stored in the classification dictionary M according to the data sorting result of the whole process of warehousing and ex-warehousing of the cigarettes, and correspondingly, the data Type corresponding to the data field is obtained through a function GT according to the name of the data field:
Type=GT(datanode,M),datanode∈E (5)。
in some specific embodiments, in the step of constructing the data blooding margin, the construction of the cigarette warehouse-in and warehouse-out full-flow data blooding margin is performed according to a combing result of the cigarette warehouse-in and warehouse-out full-flow data and an analysis and binding result of the relationship between the data nodes, and the following steps are performed:
defining a data blood margin graph GL = (E, RD), wherein E is a data node in the data graph, and E belongs to { DataNode1, dataNode 2 ,DataNode 3 , …,DataNode i And the concrete definition of the data node is as follows:
DataNode:<ID,name E ,type E ,data,updated_time> (6)
wherein ID is the unique identifier of the data node, name E Is the name, type, of the data node E For the type of the data node, data is specific data of the data node, and updated _ time is update timing information of the data field. RD in the data kindred graph is a relationship between data nodes, which is defined as:
RD:<datanode,[t start ,t end ],type RD ,name RD ,[attr 1 ,attr 2 ],…> (7)
wherein t is start And t end Respectively representing the start and end times, name, of the data's kindred relationship RD Being the name of a relationship, type RD Is the type of relationship, attr 1 And attr 2 Is the relevant attribute data implied in the relationship.
In some specific embodiments, in the step of constructing the data blood margin, after the definition of the data nodes and the relationship is completed according to the obtained data relationship RD, the construction of the data blood margin is performed:
inputting:
the SQL expression Q executes a data dictionary I of specific data related to the SQL expression, the executed time sequence information T, a Bax paradigm SQL _ BNF of the SQL language and a cigarette warehouse-in and warehouse-out full-flow data classification dictionary M;
and (3) outputting:
obtaining a keyword stream f of Q through an expression (1);
inputting keyword flow f and SQL _ BNF through formula (2) to obtain an abstract syntax tree T corresponding to Q;
inputting an abstract syntax tree T by an equation (3) to obtain a data relation RD in Q;
inputting I through a formula (4) to obtain the corresponding relation between the data field in Q and specific data;
inputting M through an equation (5) to obtain the corresponding relation between the data field in Q and the data type;
defining a data node E and a relation RD in a data blood relationship graph GL through an equation (6) and an equation (7);
and outputting a data blood margin map GL.
In some embodiments, in the step of constructing the data context, whether there is a loop in the data context graph GL is determined by DFS (DataNode), and if there is a loop, the relation with DataNode as an in-degree node and the relation name of 'from' is deleted by DEL (R (StratNode, dataNode), 'from').
In some embodiments, in the step of storing the data lineage, the graph database includes Neo4j.
A non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction device comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, and is characterized in that: the processor realizes at least one step of the non-invasive cigarette warehousing in-out full-process data blood margin construction method when executing the computer program.
A computer-readable storage medium storing a computer program executable to: the computer program is executed to realize at least one step of the non-invasive cigarette warehousing full-process data blood margin construction method.
The beneficial effect of this application does:
1) Aiming at the concrete data scene of cigarette warehousing-in and warehousing-out, the method for constructing the blood margin of the cigarette warehousing-in and warehousing-out full-flow data in a non-invasive mode is provided, which is compatible with the existing application, is complete and correct at the same time, and compared with the prior art, the method is simple to realize, does not need higher safety authority, does not bring about potential safety hazard, does not influence the existing data storage mode, and is used for constructing the blood margin of the data in a non-invasive mode on the premise of ensuring the stable operation of the existing data storage mode; and high-efficiency analysis, control, tracing and audit of the cigarette in-out warehouse full-flow data are realized based on the data consanguinity.
2) The invention applies the data consanguinity technology to the field of cigarette logistics for the first time, solves the problems of complex data, difficult data control, slow positioning and the like in the cigarette logistics circulation process, and improves the data management capability of cigarette enterprises, the control capability of cigarette warehouse-in and warehouse-out full-process data, the management efficiency and the analysis efficiency.
Drawings
In order that the manner in which the above-recited and/or other objects, features, advantages and examples of the present invention are obtained will become more readily apparent, a brief description of the drawings that are required in connection with the detailed description of the invention will be rendered, it being understood that the appended drawings in the following description are merely exemplary of the invention and that other drawings may be devised by those skilled in the art without the use of inventive faculty.
FIG. 1 is a flow chart of a data blood margin construction method;
FIG. 2 is a general diagram of data relationship resolution;
FIG. 3 is a diagram of a SQL expression overall parsing method;
FIG. 4 is a schematic diagram of the main algorithm for data blood margin construction;
FIG. 5 is a schematic diagram of an example of a blood cut map of cigarette warehousing data;
fig. 6 is a schematic diagram of detailed data contained in cigarette data nodes.
Detailed Description
Those skilled in the art can appropriately substitute and/or modify the process parameters to implement the present disclosure, but it is specifically noted that all similar substitutes and/or modifications will be apparent to those skilled in the art and are deemed to be included in the present invention. While the products and methods of making described herein have been described in terms of preferred embodiments, it will be apparent to those of ordinary skill in the art that variations and modifications in the products and methods of making described herein may be made and utilized without departing from the spirit and scope of the invention.
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The present invention uses the methods and materials described herein; other suitable methods and materials known in the art may be used. The materials, methods, and examples described herein are illustrative only and are not intended to be limiting. All publications, patent applications, patents, provisional applications, database entries, and other references mentioned herein, and the like, are incorporated by reference herein in their entirety. In case of conflict, the present specification, including definitions, will control.
The materials, methods, and examples described herein are illustrative only and not intended to be limiting unless otherwise specified. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described herein.
To facilitate an understanding of embodiments of the present invention, abbreviations and key terms that may be involved in embodiments of the present invention are first explained or defined.
DFA: a Deterministic finite automata, deterministic finite fine automata;
AST: abstract Syntax Tree, abstract Syntax Tree;
BNF: bax paradigm.
The present invention is described in detail below.
When the construction of the whole process data blood relationship of cigarette warehousing and ex-warehouse is carried out, the following difficulties are mainly involved:
(1) The data consanguinity needs to be constructed in a non-invasive mode, the existing data consanguinity construction method mostly takes invasive mode as a main part, for example, the method proposed by Bates and the like uses an invasive method to construct the data consanguinity on the level of an operation system, and on the level of a system, the method is high in efficiency but often has higher safety sensitivity, and the existing cigarette warehousing and ex-warehouse data management system has higher safety requirements, so that the method is not suitable for constructing the data consanguinity of cigarette warehousing and ex-warehouse in an invasive mode.
(2) The method is compatible with the existing cigarette warehouse-in and warehouse-out full-flow data storage and management mode, and when the construction of the cigarette warehouse-in and warehouse-out full-flow data consanguinity is carried out, the original data storage and management mode cannot be influenced, and the construction of the data consanguinity needs to be carried out at a low resource cost.
(3) The data identification and the relationship mining between the data need to be accurately and completely carried out, in the scene of cigarette warehousing and ex-warehousing, the related data volume is large, mass data is stored in a relational database, and the complete and accurate data relationship is the key for constructing and applying the data consanguinity.
Based on the data, the data consanguinity technology is applied to the field of cigarette logistics for the first time, and the cigarette warehousing-out and warehousing full-process data analysis method based on the data consanguinity is provided to solve the problems that cigarette logistics data are complex and difficult to control. According to the method, the data blood relationship of the cigarette warehouse-in and warehouse-out full-flow data is established, and the data is controlled, positioned and analyzed. The specific embodiment is as follows.
Example 1:
as shown in fig. 1, a non-invasive cigarette warehousing in-out full-process data blood margin construction method is provided, which includes the steps of data acquisition and analysis, data relation analysis, data blood margin construction and data blood margin storage, and is specifically described as follows.
First step, data acquisition and analysis
In a modern cigarette storage logistics system, when cigarettes are put in and out of a warehouse, data association and updating can be automatically carried out through various devices, high informatization is achieved, related data are stored in a relational database in the process of putting the cigarettes in and out of the warehouse, and evolution, circulation and generation of relations among the data are reflected in insertion, query and updating affairs of the data of the relational database. Therefore, the construction of the whole flow data consanguinity of cigarette warehousing and ex-warehousing can be carried out by acquiring and analyzing the transaction data of the relational database. Monitoring transactions of the relational database through a Hook function at a database level, acquiring relevant time sequence information, associated data and a correspondingly executed SQL expression Q when query (SELECT), insertion (INSERCT) and UPDATE (UPDATE) operations are monitored, storing the relevant time sequence information, the associated data and the SQL expression Q in a data table, and constructing and analyzing a data blood margin based on the data table.
Second step, data relationship analysis
After the relevant time sequence information, the SQL expression Q and the associated data when the relational database is executed are obtained, the relational database needs to be analyzed to obtain the relationship between the data and the data, so that the construction of the data blood margin is conveniently carried out subsequently, and the data relationship analysis is mainly divided into two steps: (1) analyzing an SQL expression Q; (2) binding of data; the overall diagram of data relationship analysis is shown in fig. 2.
(1) Parsing of SQL expression Q
The analysis of the data relationship is mainly obtained by analyzing an SQL expression Q, when the SQL expression Q is analyzed, the input SQL expression Q is firstly converted into a keyword (token) flow, then the keyword flow is traversed according to a grammar rule to convert the keyword flow into an abstract grammar tree structure AST, finally the abstract grammar tree is traversed, defined data nodes are identified, and the dependency relationship between the data nodes is extracted. The representation of the dependency relationship between the data nodes is performed in the form of triples. A schematic diagram of the overall parsing method of the SQL expression Q is shown in fig. 3.
For the SQL expression Q, it is first sliced by characters, Q = { c = { (c) } 1 ,c 2 ,c 3 ,…,c i In which c is i Composing characters for the ith in the SQL expression Q; according to the lexical rule of the SQL expression, a certain finite automata (DFA) D is constructed, wherein:
f=D(S,Q,δ,c 1 ) (1)
in the formula (1), S is a finite state set defined according to the lexical rule of an SQL expression, delta is a state conversion function in FDA, and f is a keyword stream obtained after word segmentation; by FDA, Q can be converted to a keyword stream f, where f = { C 1, C 2 ,C 3 ,…,C i },C i The method comprises the steps of obtaining an ith keyword in an SQL expression Q; after the list of keywords is obtained, a recursive function G is constructed:
T=G(f,grammar) (2)
in the formula (2), the grammar is a Bax paradigm corresponding to the SQL expression Q, and the keyword stream f is recursively converted into an abstract syntax tree T through a recursive function G, wherein T = (f, R), R is a connection relation between keywords and keywords in the keyword stream f, and is defined as R = { (C) 1 ,C i ),(C j ,C k ) \8230; for the abstract syntax tree T, data nodes related in the SQL expression Q are distributed on leaf nodes of the abstract syntax tree T, and corresponding father nodes contain concrete semantic relations; defining a function P to traverse the abstract syntax tree T from top to bottom to obtain the relationship between data in the SQL expression Q, wherein:
RD=P(T) (3)
in the formula (3), RD is the relationship between data and data in the SQL expression Q obtained from the abstract syntax tree T by the function P, and RD ∈ (E, L, S), where E = { datamode = 1 ,datanode 2 ,…,datanode n The data nodes are a set of data nodes related to the whole flow of cigarette warehouse entry and exit, and the data nodes are instantiation representation of the data nodes; l = { L 1 ,l 2 ,…,l r The data nodes are set of relationships between the data nodes; s \8838andE × L × E, which represents a set of triples describing the relationship between data nodes; and after the RD is obtained, binding the RD with the recorded specific data.
(2) Binding of data
When data is collected, relevant specific data when the SQL expression Q is executed is monitored and collected at the same time. The specific data types collected are shown in table 1.
TABLE 1 types of data collected
Type (B) Description of the preferred embodiment Storage mode
Watch (CN) Complete table data or multiple fields in a table Stored in array mode, in array, it is dictionary
Field(s) A field in the table Stored in arrays of specific values
Single data Single data involved /
The data acquisition result and the corresponding specific data field are stored in a data dictionary I, and the hash dictionary stores the relation between the data field and the specific data in a Key-Value Key Value pair mode; obtaining the corresponding relation U of the data nodes and the specific data thereof through a function H:
U=H(datanode,I),datanode∈E (4)。
storing the data and the corresponding classification result in a classification dictionary M according to the data carding result of the whole process of the cigarette warehousing-out and warehousing, correspondingly obtaining the data Type corresponding to the data field through a function GT according to the name of the data field:
Type=GT(datanode,M),datanode∈E (5)。
third step, construction of data blood margin
And constructing the blood margin of the cigarette warehouse-in and warehouse-out full-flow data according to the combing result of the cigarette warehouse-out and warehouse-in full-flow data and the analysis and binding result of the relationship between the data nodes and the data nodes.
Defining a data blood margin graph GL = (E, RD), wherein E is a data node in the data graph, and E is belonged to { DataNode1, dataNode 2 ,DataNode 3 ,…,DataNode i }, details of data nodesIs defined as:
DataNode:<ID,name E ,type E ,data,updated_time> (6)
wherein ID is the unique identifier of the data node, name E Is the name of the data node, type E The data is specific data of the data node, and the updated _ time is update timing information of the data field. RD in the data kindred graph is a relationship between data nodes, which is defined as:
RD:<datanode,[t start ,t end ],type RD ,name RD ,[attr 1 ,attr 2 ],…> (7)
wherein t is start And t end Respectively representing the start and end times, name, of the data's kindred relationship RD Being the name of a relationship, type RD Is the type of relationship, attr 1 And attr 2 Is the relevant attribute data contained in the relationship.
After the definition of the data nodes and the relationships is completed according to the obtained data relationship RD, the data blood margin is constructed, and the main algorithm of the data blood margin construction is shown in fig. 4.
The algorithm inputs include: the SQL expression Q is used for executing a data dictionary I of specific data related to the SQL expression, executed time sequence information T, a Bax paradigm SQL _ BNF of an SQL language and a cigarette warehouse-in and warehouse-out full-process data classification dictionary M.
The data blood relationship graph GL is constructed through the formulas (1) - (7) and the data nodes E and the relation RD in the GL are defined. And determining whether there is a loop in the data blood margin graph GL through DFS (DataNode), if there is a loop, deleting the relationship with DataNode as an in-degree node and the relationship name of 'from' through DEL (StratNode, dataNode).
The fourth step, the storage of the data blood margin
After the construction of the data blood margin is completed, the data blood margin can be stored in a graph database (such as Neo4 j) in a form of a directed acyclic graph, and the graph is used for describing the relationship between the data nodes and the data nodes in the cigarette warehousing and ex-warehousing process. Meanwhile, the method using graph query can quickly locate the relevant data, and the relevant data in the graph database can be widely applied to downstream tasks. After the data consanguinity is stored in a graph database (such as Neo4 j), related queries can be directly performed on the graph database, and the data consanguinity can also serve as a data source in other application systems.
Example 2:
on the basis of the embodiment, the method provided by the embodiment is used for analyzing the data generated by the cigarette logistics system to construct the data blood relationship, and the Neo4j graph database is used for storing and visually displaying the data blood relationship.
Specifically, for a process example of a lot number "N20081811" (i.e., a lot number "N20081811"), when cigarettes enter a 'WM071' warehouse, data blooding margin inquired by the cigarettes is shown in fig. 5, an overall data flow relationship is shown in fig. 5, a data blooding margin map interface of practical application is shown in fig. 5, wherein data nodes of different colors represent different types of data, such as "interface operation", "terminal operation", "upper operation", "entering a warehouse", "transportation", "loading", and the like represent process data, such as "05", and represent personnel data, such as "N20081 \8230, and" etc., represent cigarette data, names are not completely displayed due to limitation of sizes of data node icons, but relevant complete attribute information is contained in each data node, a cigarette data node is selected, detailed data of the lot of cigarettes can be viewed, and as shown in fig. 6, a specific lot number "N20081811" is known. The edges connected between the data nodes represent the relationship between the data nodes, such as "input", "dispatcher", "composition", "transporter", and the like.
As can be seen from fig. 5 and 6, the data can be audited and traced back through the data blood margin, for example, for a batch of cigarettes with the lot number "N20081811" (outside the building), all the data related to the batch of cigarettes in the process from the transportation to the arrival at the target goods space can be visually seen through the data blood margin. When the related data is checked through the data blood relationship, it can be seen that the related data of the upper operation dispatcher is missing in the warehousing process of the batch of cigarettes, the process data error is realized by the terminal operation, and the data tracing and auditing efficiency is improved.
Example 3:
the non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction device comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, and is characterized in that: when the processor executes the computer program, at least one step of the non-invasive cigarette warehousing in-out full-process data blood relationship construction method is realized, the same technical effect can be achieved, and in order to avoid repetition, the description is omitted.
Example 4:
there is also provided a computer readable storage medium storing a computer program executable to: when being executed, the computer program realizes at least one step of the non-invasive cigarette warehousing and ex-warehouse full-process data blood relationship construction method, can achieve the same technical effect, and is not repeated herein for avoiding repetition.
Computer-readable media include permanent and non-permanent, removable and non-removable media and may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PR AM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Conventional techniques in the above embodiments are known to those skilled in the art, and therefore, will not be described in detail herein.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
While the invention has been described in detail and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope thereof.
While the above detailed description has shown, described, and pointed out novel features as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the device or method illustrated may be made without departing from the spirit of the disclosure. In addition, the various features and methods described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of the present disclosure. While the invention has been disclosed in the context of certain embodiments and examples, it will be understood by those skilled in the art that the invention extends beyond the specifically disclosed embodiments to other alternative embodiments and/or uses and obvious modifications and equivalents thereof. Accordingly, the invention is not intended to be limited by the specific disclosure of preferred embodiments herein.
The invention is not the best known technology.

Claims (10)

1. The non-invasive cigarette warehouse-in and warehouse-out full-process data blood relationship construction method is characterized by comprising the following steps:
data acquisition and analysis, namely monitoring the affairs of a relational database storing cigarette warehousing-in and warehousing-out related data through a Hook function at a database level, and acquiring related time sequence information, related data and a correspondingly executed SQL expression Q and storing the related time sequence information, the related data and the SQL expression Q in a data table when the change operation is monitored;
analyzing data relation, analyzing relevant time sequence information, relevant data and a corresponding executed SQL expression Q, identifying data nodes, and extracting and representing the dependency relationship among the data nodes in a triple form; storing the data acquisition result and the corresponding specific data field; storing the data and the classification result corresponding to the data;
constructing a data blood margin, namely constructing the blood margin of the cigarette warehouse-in and warehouse-out full-flow data according to a combing result of the cigarette warehouse-in and warehouse-out full-flow data and an analysis and binding result of the relationship between the data nodes and the data nodes;
and storing the data blood margin, namely storing the data blood margin in a graph database in a directed acyclic graph mode, and describing the relationship between the data nodes and the data nodes in the cigarette warehousing and ex-warehousing process by using the graph.
2. The non-invasive cigarette warehousing in-out full-process data consanguinity construction method according to claim 1, characterized in that:
the change operation includes at least one of a query operation, an insert operation, and an update operation.
3. The non-invasive cigarette warehousing in-out full-process data consanguinity construction method according to claim 1 or 2, characterized by comprising the following steps:
analyzing the relevant time sequence information, the relevant data and the SQL expression Q correspondingly executed, and executing the following steps:
(1) And (3) analyzing an SQL expression Q: converting an input SQL expression Q into a keyword stream, traversing the keyword stream according to a grammar rule, converting the keyword stream into an abstract grammar tree, traversing the abstract grammar tree, identifying defined data nodes, extracting the dependency relationship between the data nodes, and expressing the dependency relationship between the data nodes in a triple form;
(2) Binding of data: when data is collected, related specific data when the SQL expression Q is executed is monitored and collected; the data acquisition result and the corresponding specific data field are stored in a data dictionary I, and the hash dictionary stores the relationship between the data field and the specific data thereof in a Key-Value Key Value pair mode; and storing the data and the corresponding classification result in a classification dictionary M according to the data carding result of the whole process of the cigarette warehousing-out and warehousing.
4. The non-invasive cigarette warehousing in-out full-process data consanguinity construction method according to claim 3, characterized in that: the analysis of the SQL expression Q executes the following steps:
for the SQL expression Q, it is sliced by characters, i.e., such that Q = { c = { c } 1 ,c 2 ,c 3 ,…,c i In which c is i The ith component character in the SQL expression Q;
according to the lexical rule of the SQL expression, a deterministic finite automaton D is constructed:
f=D(S,Q,δ,c 1 ) (1)
in the formula (1), S is a finite state set defined according to a lexical rule of an SQL expression Q, delta is a state conversion function in a deterministic finite automaton D, and f is a keyword stream obtained after word segmentation;
converting Q into keyword stream f by using a deterministic finite automaton D to obtain f = { C 1, C 2 ,C 3 ,…,C i },C i The method comprises the steps of obtaining an ith keyword in an SQL expression Q; after obtaining the list of keywords, a recursive function G is constructed:
T=G(f,grammar) (2)
in the formula (2), the grammar is a Bax paradigm corresponding to the SQL expression Q, and the keyword stream f is recursively converted into an abstract syntax tree T through a recursive function G to obtain T = (f, R), wherein R is the connection relation between the keywords and the keywords in the keyword stream f, and R = { (C) 1 ,C i ),(C j ,C k ),…};
For the abstract syntax tree T, data nodes related in the SQL expression Q are distributed on leaf nodes of the abstract syntax tree T, and corresponding father nodes contain concrete semantic relations;
defining a function P to traverse the abstract syntax tree T from top to bottom to obtain the relationship between data in the SQL expression Q, wherein:
RD=P(T) (3)
in formula (3), RD is throughThe relation between data and data in the SQL expression Q obtained by the function P from the abstract syntax tree T, RD ∈ (E, L, S), where E = { datanode = 1 ,datanode 2 ,…,datanode n E is a set of data nodes related to the whole flow of cigarette warehouse entry and exit, and datade is an instantiation representation of the data nodes; l = { L 1 ,l 2 ,…,l r L is the relationship between the data node and the data nodel r A set of (a); s \8838andE × L × E, which represents a set of triples describing the relationship between data nodes; and after the RD is obtained, binding the RD with the recorded specific data.
5. The non-invasive cigarette warehousing in-out full-process data consanguinity construction method according to claim 4, characterized in that:
obtaining a corresponding relation U between the data nodes and specific data thereof through a function H:
u = H (dataode, I), dataode ∈ E (4); and/or
According to the name of the data field, obtaining the data Type corresponding to the data field through a function GT:
Type=GT(datanode,M),datanode∈E (5)。
6. the non-invasive cigarette warehousing in-out full-process data consanguinity construction method according to claim 5, characterized by comprising the following steps: constructing the blood relationship of the cigarette warehouse-in and warehouse-out full-process data according to the carding result of the cigarette warehouse-in and warehouse-out full-process data and the analysis and binding result of the relationship between the data nodes and the data nodes, and executing the following steps:
defining a data blood-related graph GL = (E, RD), wherein E is a data node in the data graph, and E is epsilon { DataNode1, dataNode 2 ,DataNode 3 ,…,DataNode i The data node is specifically defined as:
DataNode:<ID,name E ,type E ,data,updated_time> (6)
wherein ID is the unique identifier of the data node, name E Is the name, type, of the data node E Data being a number, being a type of data nodeAccording to the specific data of the node, updated _ time is the updating time sequence information of the data field; RD in the data consanguinity graph is the relationship between data nodes and is defined as:
RD:<datanode,[t start ,t end ],type RD ,name RD ,[attr 1 ,attr 2 ],…> (7)
wherein t is start And t end Respectively representing the start and end times, name, of the data's kindred relationship RD Is the name of a relationship, type RD As a type of relationship, attr 1 And attr 2 Is the relevant attribute data implied in the relationship.
7. The non-invasive cigarette warehousing in-out full-process data consanguinity construction method according to claim 6, characterized by comprising the following steps:
after the definition of the data nodes and the relationship is completed according to the obtained data relationship RD, constructing a data blood margin:
inputting:
the SQL expression Q executes a data dictionary I of specific data related to the SQL expression, the executed time sequence information T, a Bax paradigm SQL _ BNF of the SQL language and a cigarette warehouse-in and warehouse-out full-flow data classification dictionary M;
and (3) outputting:
obtaining a keyword stream f of Q through an expression (1);
inputting keyword flow f and BNF through formula (2) to obtain an abstract syntax tree T corresponding to Q;
inputting an abstract syntax tree T by an equation (3) to obtain a data relation RD in Q;
inputting I through a formula (4), and acquiring the corresponding relation between the data field in Q and specific data;
inputting M through an equation (5) to obtain the corresponding relation between the data field in Q and the data type;
defining a data node E and a relation RD in a data blood relationship graph GL through a formula (6) and a formula (7);
and outputting a data blood relationship map GL.
8. The non-invasive cigarette warehousing in-out full-process data consanguinity construction method according to claim 7, characterized in that:
whether a ring exists in the data blood margin map GL is judged through DFS (DataNode), if yes, a relation with the DataNode as an in-degree node and a relation name of 'from' is deleted through DEL (StratNode, dataNode) and 'from'.
9. A non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction device comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, and is characterized in that: the processor, when executing the computer program, performs at least one step of the method of any one of claims 1 to 8.
10. A computer-readable storage medium storing a computer program executable to: the computer program when executed performs at least one step of the method of any one of claims 1 to 8.
CN202211717745.0A 2022-12-30 2022-12-30 Non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction method and device Active CN115687309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211717745.0A CN115687309B (en) 2022-12-30 2022-12-30 Non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211717745.0A CN115687309B (en) 2022-12-30 2022-12-30 Non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction method and device

Publications (2)

Publication Number Publication Date
CN115687309A true CN115687309A (en) 2023-02-03
CN115687309B CN115687309B (en) 2023-04-18

Family

ID=85057022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211717745.0A Active CN115687309B (en) 2022-12-30 2022-12-30 Non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction method and device

Country Status (1)

Country Link
CN (1) CN115687309B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041739A1 (en) * 2010-08-12 2012-02-16 Heartflow, Inc. Method and System for Patient-Specific Modeling of Blood Flow
CN111782738A (en) * 2020-08-14 2020-10-16 北京斗米优聘科技发展有限公司 Method and device for constructing database table level blood relationship
CN112328667A (en) * 2020-07-17 2021-02-05 四川长宁天然气开发有限责任公司 Shale gas field ground engineering digital handover method based on data blooding margin
CN112818015A (en) * 2021-01-21 2021-05-18 广州汇通国信科技有限公司 Data tracking method, system and storage medium based on data blood margin analysis
CN113934750A (en) * 2021-10-26 2022-01-14 上海泽字信息科技有限公司 Data blood relationship analysis method based on compiling mode
CN114036130A (en) * 2021-11-09 2022-02-11 中国建设银行股份有限公司 Metadata analysis processing method and device
CN114356964A (en) * 2022-01-04 2022-04-15 网易(杭州)网络有限公司 Data blood margin construction method and device, storage medium and electronic equipment
CN115328894A (en) * 2022-06-23 2022-11-11 中兴智慧(北京)技术有限公司 Data processing method based on data blood margin
CN115409541A (en) * 2022-08-08 2022-11-29 浙江中烟工业有限责任公司 Cigarette brand data processing method based on data blood relationship

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041739A1 (en) * 2010-08-12 2012-02-16 Heartflow, Inc. Method and System for Patient-Specific Modeling of Blood Flow
CN112328667A (en) * 2020-07-17 2021-02-05 四川长宁天然气开发有限责任公司 Shale gas field ground engineering digital handover method based on data blooding margin
CN111782738A (en) * 2020-08-14 2020-10-16 北京斗米优聘科技发展有限公司 Method and device for constructing database table level blood relationship
CN112818015A (en) * 2021-01-21 2021-05-18 广州汇通国信科技有限公司 Data tracking method, system and storage medium based on data blood margin analysis
CN113934750A (en) * 2021-10-26 2022-01-14 上海泽字信息科技有限公司 Data blood relationship analysis method based on compiling mode
CN114036130A (en) * 2021-11-09 2022-02-11 中国建设银行股份有限公司 Metadata analysis processing method and device
CN114356964A (en) * 2022-01-04 2022-04-15 网易(杭州)网络有限公司 Data blood margin construction method and device, storage medium and electronic equipment
CN115328894A (en) * 2022-06-23 2022-11-11 中兴智慧(北京)技术有限公司 Data processing method based on data blood margin
CN115409541A (en) * 2022-08-08 2022-11-29 浙江中烟工业有限责任公司 Cigarette brand data processing method based on data blood relationship

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHIWEN SHI; YANYI CHU; YONGHONG ZHANG; YANJING WANG; DONG-QING WEI;: "Prediction of Blood-Brain Barrier Permeability of Compounds by Fusing Resampling Strategies and eXtreme Gradient Boosting" *
李亚洲;陈坚;: "论公安机关数据治理体系的创新" *
李春梅;张星;耿慧拯;杨亭亭;张鑫月;郭斯栩;: "基于数据血缘构建数据分析方法" *

Also Published As

Publication number Publication date
CN115687309B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
US11409764B2 (en) System for data management in a large scale data repository
US11461294B2 (en) System for importing data into a data repository
US11360950B2 (en) System for analysing data relationships to support data query execution
US11847574B2 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
WO2017076263A1 (en) Method and device for integrating knowledge bases, knowledge base management system and storage medium
CN108052618B (en) Data management method and device
CN110990447B (en) Data exploration method, device, equipment and storage medium
US10452628B2 (en) Data analysis schema and method of use in parallel processing of check methods
US9110935B2 (en) Generate in-memory views from universe schema
CN117076742A (en) Data blood edge tracking method and device and electronic equipment
Azeroual et al. Combining data lake and data wrangling for ensuring data quality in CRIS
CN116431598A (en) Redis-based relational database full memory method
CN114116767A (en) Method and device for converting SQL (structured query language) query statement of database
CN115687309B (en) Non-invasive cigarette warehouse-in and warehouse-out full-process data blood margin construction method and device
CN116126918A (en) Data generation method, information screening method, device and medium
CN113221528B (en) Automatic generation and execution method of clinical data quality evaluation rule based on openEHR model
CN114861229A (en) Hive dynamic desensitization method and system
US11720553B2 (en) Schema with methods specifying data rules, and method of use
Sahi et al. NoSQL: Will it be an alternative to a relational database? MySQL vs MongoDB comparison
CN112783758A (en) Test case library and feature library generation method, device and storage medium
Meimaris Managing, querying and analyzing big data on the web
US12117979B1 (en) Timestamp-based deletions for interdependent data objects
Korshunov et al. Ontological approach to the integration ofknowledge from external sources
US20230004583A1 (en) Method of graph modeling electronic documents with author verification
Jacobsen et al. Spark Optimization: A Column Recommendation System for Data Partitioning and Z-Ordering on ETL Platforms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant