CN112540989B - Data right-determining and managing method based on data exchange log - Google Patents
Data right-determining and managing method based on data exchange log Download PDFInfo
- Publication number
- CN112540989B CN112540989B CN202011423098.3A CN202011423098A CN112540989B CN 112540989 B CN112540989 B CN 112540989B CN 202011423098 A CN202011423098 A CN 202011423098A CN 112540989 B CN112540989 B CN 112540989B
- Authority
- CN
- China
- Prior art keywords
- data
- sub
- railway
- dimension
- dimensions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 239000011159 matrix material Substances 0.000 claims abstract description 39
- 238000007726 management method Methods 0.000 claims abstract description 20
- 238000005516 engineering process Methods 0.000 claims abstract description 16
- 238000007405 data analysis Methods 0.000 claims abstract description 12
- 238000013475 authorization Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 16
- 238000000586 desensitisation Methods 0.000 claims description 12
- 238000013502 data validation Methods 0.000 claims description 6
- 238000013523 data management Methods 0.000 claims description 4
- 230000035945 sensitivity Effects 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 238000013499 data model Methods 0.000 claims description 3
- 238000012958 reprocessing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 10
- 238000005065 mining Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000037221 weight management Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Human Resources & Organizations (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Storage Device Security (AREA)
Abstract
The invention provides a data right-determining and managing method based on a data exchange log. The method comprises the following steps: acquiring a data exchange log, and extracting log information containing data exchange rules in the data exchange log; carrying out data analysis on the log information, and acquiring a data table and a data item which participate in data exchange according to a data analysis result; marking information of five sub dimensions of safety, time, space, service and technology in each data table and data item according to a predefined rule, performing role and user authentication on each data table and data item, determining the data weight relationship of each data table and data item, constructing a railway data weight matrix according to the data weight relationship of each data table and data item and the structure of a railway system data map, and performing hierarchical classification management on railway data objects by utilizing the railway data weight matrix. The method can improve the efficiency, accuracy and automation degree of data right relation acquisition.
Description
Technical Field
The invention relates to the technical field of railway data resource management, in particular to a data right determining and managing method based on a data exchange log.
Background
In the practice of integrating and utilizing railway data resources, the problems that the main data center is not fully loaded with data, and particularly the data center data cannot widely meet the sharing requirements of various information systems and various road office group data applications still exist, and ecological environments of widely and deeply applying railway data, promoting railway operation safety promotion, reducing cost and enhancing efficiency are not formed yet. In order to fully mine the data value, a good data service system is constructed, the data rights are required to be clarified, and a railway rights management information system is constructed on the basis of determining the data rights, so that the open sharing of railway data resources is promoted, and a premise and a foundation are provided for building the ecology of the railway healthy digital economic service.
With the deepening of railway reforms and the development of intelligent railway construction, the demand for railway data sharing is increasing. Under the centralized deployment architecture of the railway information system at present, a large amount of data generated in a station section is directly collected to the headquarters of national railway group companies, and data resources distributed in a vertical information system are difficult to open and utilize mutually.
The number of the systems for the butt joint of the railway data centers is large, the number of the data exchange interfaces among the systems is not counted, and the original manual authorization mode for managing the data rights is insufficient to meet the current requirements, so that the construction of the railway data rights matrix based on the data exchange logs among the systems has the following advantages and significance:
(1) Unclear data rights are major obstacles to data management and data flow. The title attribution of the data from different sources is clarified through the data right, and is a premise and a foundation for establishing a data circulation rule, realizing data sharing opening and forming big data industry;
(2) For extremely complex and diverse data sharing service scenes, data exchange is a typical knowledge intensive activity, the data exchange flow is very flexible, a rule constraint-based flow mining technology can provide method support for rule knowledge analysis of the data exchange in a specific scene, and data exchange rules are mined and found from a data service flow event log to further derive data right determining rules;
(3) As data rights change with the dynamic evolution of information systems and organizations, a data validation mechanism is needed that supports dynamic data rights management and flexibility. The process mining technology based on the process real event log can provide a feasible method foundation for revealing a dynamic complex data ownership view in a data sharing service process;
(4) The process mining technology based on the real event log can provide a feasible method foundation for revealing a dynamic complex data ownership relation view in a data sharing service process, can reveal objective modes of data exchange and data right-determining processes in data service, delineate and quantitatively analyze dynamic allocation relations among organizations of data ownership, and mine to obtain data exchange rules and data right-determining rules.
Currently, there is no effective scheme for managing the rights of railway data based on data exchange logs in the prior art.
Disclosure of Invention
The embodiment of the invention provides a data right-confirming and managing method based on a data exchange log, which is used for effectively carrying out data right-confirming and managing on railway data.
In order to achieve the above purpose, the present invention adopts the following technical scheme.
A data right and management method based on data exchange log includes:
Acquiring a data exchange log, and extracting log information containing data exchange rules in the data exchange log;
Carrying out data analysis on the extracted log information, and acquiring a data table and a data item which participate in data exchange according to a data analysis result;
Marking the information of five sub-dimensions of safety, time, space, service and technology in each data table and data item according to a predefined rule, performing role and user right determination on each data table and data item in a railway data right system, determining the data right relationship of each data table and data item,
According to the data weight relation of each data table and data items, constructing a railway data weight matrix according to the structure of a railway system data map, and carrying out hierarchical classification management on railway data objects by utilizing the railway data weight matrix.
Preferably, the data analysis is performed on the extracted log information, and a data table and a data item participating in data exchange are obtained according to the data analysis result, including:
The SQL sentence of the log information is converted into a tree structure, a table and an alias thereof are found according to the FROM, a corresponding connection table and an alias thereof are found according to the JOIN, a query field of the corresponding table is obtained FROM a query list in the tree structure, and the GROUP BY and the HAVING clauses in the query field are correspondingly processed to obtain the data table and the data item corresponding relation participating in data exchange after the duplication removal.
Preferably, the marking the information of the security, time, space, service and technology five sub-dimensions in each data table and data item according to the predefined rule includes:
Setting railway data comprising five sub-dimensions of safety, time, space, service and technology, marking information of the five sub-dimensions of safety, time, space, service and technology in each data table and data item according to a predefined rule, and setting initial values of the five sub-dimensions in each data table and data item;
the security sub-dimension is used for carrying out corresponding desensitization and decryption processing on the data according to the security sensitivity degree of the data, the time sub-dimension is used for dividing the data by taking the generation time of the data as a starting point, the summarization time period of the shared data is set, the space sub-dimension is used for dividing the space range generated by the data, and the business sub-dimension is used for summarization arrangement of different levels according to the main data dimension contained in the data record.
Preferably, the security sub-dimension comprises a public, internal, confidential and national secret; the values of the technical sub-dimensions comprise queriable and downloadable values; the values of the time sub-dimension include real-time, minute, hour, day, month, quarter and year, and the data describing how long the distance data is generated can be authorized to each user; the values of the space sub-dimensions comprise all paths, in-road bureaus, stations, segments, lines and intervals, and are used for describing which space range data can be authorized to each user; the values of the business sub-dimensions comprise non-summary, primary large class summary and secondary large class summary, wherein the non-summary indicates that a user can see the complete fields of the form, then the final seen data is determined according to other sub-dimensions, the primary large class summary indicates that the user can see partial fields, and the secondary large class summary indicates that the user can see partial fields.
Preferably, the constructing a railway data ownership matrix according to the structure of the railway system data map according to the data ownership relation of each data table and each data item comprises:
Summarizing, reducing, reprocessing and/or sampling all the data tables and data items according to the five sub-dimensions, constructing a railway data ownership matrix according to the positioning result of the data set sources of the data exchange logs and the data weight relationship of all the data tables and data items and the structure of a railway system data map, wherein the upper layer of a row in the railway data ownership matrix represents a system user with corresponding data, the lower layer represents the data table and the data item of the user ownership data, and the list represents the related information of the five sub-dimensions of the data table and the data item;
And acquiring a new data exchange log, acquiring a new data list and the data weight relation of the data items, and updating the railway data weight matrix according to the new data list and the data weight relation of the data items.
Preferably, the step of classifying and managing the railway data objects by using the railway data weight matrix includes:
The railway data object for hierarchical classification management is arranged to comprise a data entity, a data table and data items, wherein the data entity is a set of the data table, the data table is a set of the data items, the data entity only has a safety sub-dimension, the data table has a safety sub-dimension and a technical sub-dimension, and the data items have time, space, safety and service sub-dimensions;
Sequentially determining the security sub-dimension of the data entity, the security sub-dimension of the data table, the time, the space, the service and the technical sub-dimension of the data item, wherein when the security sub-dimension of the data entity is public, the data entity does not need to be subjected to desensitization and decryption processing, and when the security sub-dimension is internal, confidential or national secret, the data entity needs to be subjected to desensitization and decryption processing;
The security sub-dimension value of the data table is directly inherited from the data entity or reset, the security sub-dimension of the data table is greater than or equal to the security sub-dimension of the data entity, when the technical sub-dimension of the data table is queriable, the data table and all data items below the data table can only be browsed online, and when the technical sub-dimension of the data table is downloadable, the data table and all data items below the data table can be downloaded and used; the security sub-dimension value of the data item is directly inherited from the data table or reset, and the security sub-dimension of the data item is required to be greater than or equal to the security sub-dimension of the data table; the data items with time, space, business attributes can be set up for time, space, business sub-dimensions, respectively.
Preferably, the step of classifying and managing the railway data objects by using the railway data weight matrix further includes:
unified authorization, authority application and authority verification are carried out on the railway data object according to the railway data authority matrix, and a data authorization application is automatically generated in a railway data authority management system according to the generated railway data authority matrix;
The unified authorization is used for a user with data management right to grant the right of use of the data table for other users, the right application is used for applying the right of use to the data item of the selected data entity, the right auditing is used for an auditing department login system to check whether a new data application exists recently or not, and whether the application party and the application content pass auditing is determined after the application party and the application content are confirmed;
And extracting related data from the platform original data table and the service data table through the corresponding shared data model to generate a basic shared data set, processing according to the ownership relation and the data sharing dimension of each user to generate a railway data shared data set for each user, and authorizing the railway data shared data set to the users on the platform for sharing.
According to the technical scheme provided by the embodiment of the invention, the method can improve the efficiency, accuracy and automation degree of acquiring the data right relation. The invention has a dynamic updating mechanism, and ensures the correctness and consistency of the data.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an implementation principle of a data validation and management method based on a data exchange log according to an embodiment of the present invention;
fig. 2 is a process flow diagram of a data validation and management method based on a data exchange log according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the purpose of facilitating an understanding of the embodiments of the invention, reference will now be made to the drawings of several specific embodiments illustrated in the drawings and in no way should be taken to limit the embodiments of the invention.
The implementation principle schematic diagram of the data validation and management method based on the data exchange log provided by the embodiment of the invention is shown in fig. 1, the specific processing flow is shown in fig. 2, and the implementation principle schematic diagram comprises the following processing steps:
Step S210, obtaining a data exchange log, and extracting log information containing data exchange rules from the data exchange log, wherein the log information can be described by SQL ((Structured Query Language, structured query language)) language.
And step 220, determining a data mining rule according to the log information characteristics and the summary of the log output result of the existing object relation mapping framework.
According to the data mining rule, the extracted log information is subjected to data analysis BY using a text mining method, SQL sentences of the log information are converted into a tree structure, a table and an alias thereof are found according to FROM, a corresponding connection table and an alias thereof are found according to JOIN, then query fields of the corresponding tables are obtained FROM a query list in the tree structure, and clauses such as GROUP BY, HAVING and the like are processed correspondingly. The data analysis result is the corresponding relation between the data table and the data item which participate in the data exchange after the duplication removal, and is expressed in the JSON format.
Step S230, marking information of five sub-dimensions of security, time, space, service and technology in each data table and data item according to a predefined rule, and setting initial values for the five sub-dimensions in each data table and data item.
The embodiment of the invention sets five sub-dimensions of railway data including safety, time, space, service and technology, wherein the five sub-dimensions are also data sharing dimensions of the railway data. The security sub-dimension is used for carrying out corresponding desensitization and decryption processing on the data according to the security sensitivity degree (which can be classified into high sensitivity, interior and disclosure) of the data. The time sub-dimension is used for dividing by taking the generation time of the data as a starting point, and setting the summarization period of the shared data. The space sub-dimension is used for dividing a space range generated by data and can be divided into dimensions of a whole road, a road bureau (directly affiliated units, stock control and financing companies), stations, sections, lines, sections and the like, and dimensions of specific spaces (such as high altitude, high cold, damp heat and areas). The business sub-dimension is used for carrying out different-level summarization and arrangement according to the dimension of main data (basic code classification field) contained in the data record, and can be divided into non-summarization, primary large-class summarization, secondary large-class summarization, specific class data and the like. The technology sub-dimension refers to query, downloadable (developer oriented, metadata support) of data on a platform.
The security sub-dimension values include public, internal, confidential, and national secrets; the values of the technical sub-dimensions comprise queriable and downloadable; the values of the time sub-dimension include real-time, minute, hour, day, month, quarter, and year, and data describing how long the distance data is generated may be authorized for each user; the values of the space sub-dimensions include all-way, in-road bureau (direct unit, stock control and financing company), station, section, line and interval, and data describing which space ranges can be authorized for each user; the service sub-dimension value comprises non-summary, primary summary and secondary summary. The non-summary indicates that the user can see the complete field of the form, and then the finally seen data is determined according to other sub-dimensions, the first-class summary indicates that the user can see partial fields, and the second-class summary indicates that the user can see partial fields.
Step S240, based on the existing railway data map, performing role and user confirmation on each data table and each data item in the railway data right system, determining the data right relationship of each data table and each data item, and constructing a railway data right matrix according to the structure of the railway system data map.
For example, fields such as tables pb22_ CARINFO and AIMYARDCODE, CARCATEGORY, CARMODELSID mined according to the transport scheduling management system interface interaction log are matched with user account numbers such as a system administrator role and a scheduling department administrator, which have the data table and the field data authority, in the railway data map. The railway data map contains relevant information of each railway system, the uppermost layer is each railway unit and department, the next layer is the used system name, the next layer is the system user group, and the lowest layer is the system user name.
Summarizing, reducing, reprocessing and/or sampling the data tables and the data items according to the five sub-dimensions, and constructing a railway data ownership matrix according to the positioning result of the data set sources of the data exchange logs and the data ownership relation of the data tables and the data items and the structure of the railway system data map.
In the finally formed railway data weight matrix, the upper layer of the row represents a system user with corresponding data, the lower layer represents a data table and data items of the user weight data, the list represents related information of five sub-dimensions of the data table and the data items, for example, the data weight information mined from the log can be represented in the data weight matrix as the authority of the data items such as AIMYARDCODE, CARCATEGORY, CARMODELSID of the data table of the PB 22-CARINFO by an administrator user of a transportation part of a transportation scheduling management system, wherein the service sub-dimension of the CARCATEGORY data items is true, AIMYARDCODE, the space sub-dimension of the data items is true, CARMODELSID, and the safety sub-dimension of the data items is true.
And continuously acquiring a new data exchange log, acquiring a new data list and the data weight relation of the data items, and updating the railway data weight matrix according to the new data list and the data weight relation of the data items. The accuracy and consistency of the data in the railway data weight matrix are ensured.
Step S250, carrying out classified management on the railway data objects according to the railway data weight matrix, carrying out unified authorization, authority application and authority verification on the railway data objects, and automatically generating a data authorization application in a railway data weight management system according to the generated railway data weight matrix.
The railway data object capable of being subjected to hierarchical classification management comprises a data entity, a data table and data items, wherein the data entity is a set of the data table, and the data table is a set of the data items. The invention provides five sub-dimensions for data validation, which are respectively as follows: a security sub-dimension, a technology sub-dimension, a time sub-dimension, a space sub-dimension, and a business sub-dimension. The data entity has only a security sub-dimension, the data table has a security, technical sub-dimension, and the data item has a time, space, security and business sub-dimension.
And determining the security sub-dimension of the data entity, the security sub-dimension and the technical sub-dimension of the data table, and the time, space, service and technical sub-dimension of the data item in sequence. When the security sub-dimension of the data entity is public, the data entity does not need to be subjected to desensitization and decryption processing, and when the security sub-dimension is internal, confidential or national secret, the data entity needs to be subjected to desensitization and decryption processing.
The security sub-dimension value of the data table can be directly inherited from the data entity or can be reset, and the security sub-dimension of the data table must be greater than or equal to the security sub-dimension of the data entity. When the technical sub-dimension of the data table is queriable, the data table and all data items below the data table can only be browsed online, and when the technical sub-dimension of the data table is downloadable, the data table and all data items below the data table can be downloaded and used.
The time, space, business and security sub-dimensions of the data item are determined according to the specific user to achieve the effect of authorization. The security sub-dimension value of the data item can be directly inherited from the data table, and can also be reset, and the security sub-dimension of the data item must be greater than or equal to the security sub-dimension of the data table. The data items with time, space and business attributes can be respectively provided with time, space and business sub-dimensions, authorization is completed, and the user can use the data obtained after summarization. Through the business sub-dimension, the data items are reduced or even reworked; by means of the temporal and spatial sub-dimensions, the data records are reduced, and only data of a specific spatial range within a specific time period is available to the user.
In the railway data structure, all original data of a data entity are marked as a set A, all original data contained after the data entity is subjected to desensitization and decryption are marked as a set B, all original data contained after the data table is subjected to desensitization and decryption are marked as a set C, summary data with time attribute are marked as a set D after data items are processed according to time sub-dimensions, summary data with space attribute are marked as a set E after data items are processed according to space sub-dimensions, summary data with service attribute are marked as a set F after data items are processed according to service sub-dimensions, and summary data after time and space sub-dimensions are marked as a set G; meanwhile, the summarized data after time and business sub-dimension processing is recorded as H; meanwhile, the summarized data processed by the space and business sub-dimension are recorded as workers; meanwhile, the summarized data processed by time, space and business sub-dimensions is recorded as J; the set of non-temporal, spatial, business sub-dimension processes is denoted as K. According to the set theory method, the relationship between sets is defined as:
C=D∪E∪F∪K
G=D∩E={x|x∈D∧x∈E}
H=D∩F={x|x∈D∧x∈F}
I=E∩F={x|x∈E∧x∈F}
J=D∩E∩F={x|x∈D∧x∈E∧x∈F}
In practice, the automatic acquisition of the right-determining information and the dimension information by the data right-determining method based on the event log flow mining is often incomplete and can change, and the information which is needed to be mutually complemented with the manually acquired information. According to the railway data sharing dimension setting of the embodiment, aiming at user classification of different sharing requirements, acquired dimension information of acquired data items is corrected or corresponding parameters of five dimensions are reset respectively, so that the data sharing dimension for the user is formed. In addition, when the interface of the embodiment is changed, the railway data weight matrix is synchronously and dynamically modified by the method.
The shared data set generation process for a user includes: and positioning the data sources of the data set to the original data table and the service data table, judging the rights relation between the user and the data items, selecting the data items meeting the conditions, and carrying out data sharing dimension processing according to the sequence of security, time, space, service and technical sub-dimension. The security sub dimension eliminates the data items which do not meet the conditions, and performs data desensitization and decryption; the time, space and business sub-dimension takes the highest level setting value of each item of data to screen or collect; and judging whether the shared data set can be subjected to operations such as downloading or not according to the technical sub-dimension.
According to the railway data weight matrix, unified authorization, authority application and authority verification can be carried out on the railway data object, and according to the generated railway data weight matrix, a data authorization application is automatically generated in a railway data weight management system.
The automatically generated data ownership matrix can be regarded as the authorization of data for the system users having relevant data in the railway data ownership matrix, and the authority of which users of which systems can have which data tables and data items can be directly obtained from the railway data ownership matrix, wherein the data tables and the data items contain the information of five sub-dimensions mined according to the predefined rules. Therefore, the data authorization application can be automatically generated according to the data ownership matrix, and the automatic generation process replaces the manual generation of the data authorization application.
The unified authority is used for a user with data management authority to grant the use right of the data table for other users. The right application is used for applying the right of use to the data items of the selected data entity. And the authority auditing is used for checking whether a new data application exists recently or not by the auditing department login system, and determining whether the auditing is passed or not after the application party and the application content are confirmed.
And extracting related data from the platform original data table and the service data table through the corresponding shared data model to generate a basic shared data set, processing according to the ownership relation and the data sharing dimension of each user on the basis of the basic shared data set to generate a railway data shared data set for each user, and authorizing the railway data shared data set to the user on the platform for sharing.
In summary, the method of the invention can improve the efficiency, accuracy and automation degree of data right relation acquisition, and realize effective data right determination and management of railway data. The invention has a dynamic updating mechanism, and ensures the correctness and consistency of the data.
The railway data weight matrix mined according to the system interface log can automatically generate a data authorization application to replace a manual authorization mode. A general mining flow method is designed for the system logs generated by data exchange. The invention improves the efficiency and accuracy of data right relation acquisition. The dynamic updating mechanism ensures the instantaneity, the correctness and the consistency of the data right information. The data is classified and graded through the five data rights sub-dimensions, so that railway data can be shared in a targeted manner on one hand, and data sharing can be performed under the condition of ensuring the privacy and safety of the data on the other hand.
Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.
From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part. The apparatus and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (5)
1. A data validation and management method based on a data exchange log, comprising:
Acquiring a data exchange log, and extracting log information containing data exchange rules in the data exchange log;
Carrying out data analysis on the extracted log information, and acquiring a data table and a data item which participate in data exchange according to a data analysis result;
Marking the information of five sub-dimensions of safety, time, space, service and technology in each data table and data item according to a predefined rule, performing role and user right determination on each data table and data item in a railway data right system, determining the data right relationship of each data table and data item,
Constructing a railway data ownership matrix according to the data ownership relation of each data table and each data item and the structure of a railway system data map, and carrying out hierarchical classification management on railway data objects by utilizing the railway data ownership matrix;
the marking of the information of the security, time, space, service and technology five sub-dimensions in each data table and data item according to the predefined rule comprises:
Setting railway data comprising five sub-dimensions of safety, time, space, service and technology, marking information of the five sub-dimensions of safety, time, space, service and technology in each data table and data item according to a predefined rule, and setting initial values of the five sub-dimensions in each data table and data item;
the security sub-dimension is used for carrying out corresponding desensitization and decryption processing on the data according to the security sensitivity degree of the data, the time sub-dimension is used for dividing the data by taking the generation time of the data as a starting point, the summarization time period of shared data is set, the space sub-dimension is used for dividing the space range generated by the data, and the business sub-dimension is used for summarization arrangement of different levels according to the main data dimension contained in the data record;
The values of the security sub-dimension include public, internal, confidential and national secrets; the values of the technical sub-dimensions comprise queriable and downloadable values; the values of the time sub-dimension include real-time, minute, hour, day, month, quarter and year, and the data describing how long the distance data is generated can be authorized to each user; the values of the space sub-dimensions comprise all paths, in-road bureaus, stations, segments, lines and intervals, and are used for describing which space range data can be authorized to each user; the values of the service sub-dimensions comprise non-summary, primary large class summary and secondary large class summary, wherein the non-summary indicates that a user can see a complete field of a form, then the data finally seen is determined according to other sub-dimensions, the primary large class summary indicates that the user can see a part of the field, and the secondary large class summary indicates that the user can see a part of the field;
The step of classifying and managing the railway data objects by utilizing the railway data weight matrix comprises the following steps:
The railway data object for hierarchical classification management is arranged to comprise a data entity, a data table and data items, wherein the data entity is a set of the data table, the data table is a set of the data items, the data entity only has a safety sub-dimension, the data table has a safety sub-dimension and a technical sub-dimension, and the data items have time, space, safety and service sub-dimensions;
sequentially determining the security sub-dimension of the data entity, the security sub-dimension of the data table, the time, the space, the security and the business sub-dimension of the data item, wherein when the security sub-dimension of the data entity is public, the data entity does not need to be subjected to desensitization and decryption processing, and when the security sub-dimension is internal, confidential or national secret, the data entity needs to be subjected to desensitization and decryption processing;
The security sub-dimension value of the data table is directly inherited from the data entity or reset, the security sub-dimension of the data table is greater than or equal to the security sub-dimension of the data entity, when the technical sub-dimension of the data table is queriable, the data table and all data items below the data table can only be browsed online, and when the technical sub-dimension of the data table is downloadable, the data table and all data items below the data table can be downloaded and used; the security sub-dimension value of the data item is directly inherited from the data table or reset, and the security sub-dimension of the data item is required to be greater than or equal to the security sub-dimension of the data table; the data items with time, space, business attributes can be set up for time, space, business sub-dimensions, respectively.
2. The method of claim 1, wherein the data analysis is performed on the extracted log information, and the data table and the data item participating in the data exchange are obtained according to the data analysis result, including:
The SQL sentence of the log information is converted into a tree structure, a table and an alias thereof are found according to the FROM, a corresponding connection table and an alias thereof are found according to the JOIN, a query field of the corresponding table is obtained FROM a query list in the tree structure, and the GROUP BY and the HAVING clauses in the query field are correspondingly processed to obtain the data table and the data item corresponding relation participating in data exchange after the duplication removal.
3. The method of claim 1, wherein the value of the security sub-dimension includes public, internal, confidential, and national secrets; the values of the technical sub-dimensions comprise queriable and downloadable values; the values of the time sub-dimension include real-time, minute, hour, day, month, quarter and year, and the data describing how long the distance data is generated can be authorized to each user; the values of the space sub-dimensions comprise all paths, in-road bureaus, stations, segments, lines and intervals, and are used for describing which space range data can be authorized to each user; the values of the business sub-dimensions comprise non-summary, primary large class summary and secondary large class summary, wherein the non-summary indicates that a user can see the complete fields of the form, then the final seen data is determined according to other sub-dimensions, the primary large class summary indicates that the user can see partial fields, and the secondary large class summary indicates that the user can see partial fields.
4. The method of claim 1, wherein constructing a railway data ownership matrix according to the structure of the railway system data map based on the data ownership of each data table and data item comprises:
Summarizing, reducing, reprocessing and/or sampling all the data tables and data items according to the five sub-dimensions, constructing a railway data ownership matrix according to the positioning result of the data set sources of the data exchange logs and the data weight relationship of all the data tables and data items and the structure of a railway system data map, wherein the upper layer of a row in the railway data ownership matrix represents a system user with corresponding data, the lower layer represents the data table and the data item of the user ownership data, and the list represents the related information of the five sub-dimensions of the data table and the data item;
And acquiring a new data exchange log, acquiring a new data list and the data weight relation of the data items, and updating the railway data weight matrix according to the new data list and the data weight relation of the data items.
5. The method of claim 1, wherein the hierarchically managing the railway data objects using the railway data weight matrix further comprises:
unified authorization, authority application and authority verification are carried out on the railway data object according to the railway data authority matrix, and a data authorization application is automatically generated in a railway data authority management system according to the generated railway data authority matrix;
The unified authorization is used for a user with data management right to grant the right of use of the data table for other users, the right application is used for applying the right of use to the data item of the selected data entity, the right auditing is used for an auditing department login system to check whether a new data application exists recently or not, and whether the application party and the application content pass auditing is determined after the application party and the application content are confirmed;
And extracting related data from the platform original data table and the service data table through the corresponding shared data model to generate a basic shared data set, processing according to the ownership relation and the data sharing dimension of each user to generate a railway data shared data set for each user, and authorizing the railway data shared data set to the users on the platform for sharing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011423098.3A CN112540989B (en) | 2020-12-08 | 2020-12-08 | Data right-determining and managing method based on data exchange log |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011423098.3A CN112540989B (en) | 2020-12-08 | 2020-12-08 | Data right-determining and managing method based on data exchange log |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112540989A CN112540989A (en) | 2021-03-23 |
CN112540989B true CN112540989B (en) | 2024-05-03 |
Family
ID=75019379
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011423098.3A Active CN112540989B (en) | 2020-12-08 | 2020-12-08 | Data right-determining and managing method based on data exchange log |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112540989B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114564749B (en) * | 2022-03-04 | 2022-12-23 | 厦门熙重电子科技有限公司 | User information protection method and server for smart cloud service |
CN116383777B (en) * | 2023-03-28 | 2024-02-27 | 云启智慧科技有限公司 | Data management platform and data right determining method facing data management |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016118997A (en) * | 2014-12-22 | 2016-06-30 | 株式会社日立システムズ | Path finding log analysis system and path finding log analysis method |
CN108830554A (en) * | 2018-05-29 | 2018-11-16 | 农业部规划设计研究院 | The outcome data information quality intelligent detecting method and system of task based access control model |
CN109800225A (en) * | 2018-12-24 | 2019-05-24 | 北京奇艺世纪科技有限公司 | Acquisition methods, device, server and the computer readable storage medium of operational indicator |
CN111352999A (en) * | 2020-03-06 | 2020-06-30 | 九次方大数据信息集团有限公司 | National data circulation and data right confirming method and platform based on block chain |
-
2020
- 2020-12-08 CN CN202011423098.3A patent/CN112540989B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016118997A (en) * | 2014-12-22 | 2016-06-30 | 株式会社日立システムズ | Path finding log analysis system and path finding log analysis method |
CN108830554A (en) * | 2018-05-29 | 2018-11-16 | 农业部规划设计研究院 | The outcome data information quality intelligent detecting method and system of task based access control model |
CN109800225A (en) * | 2018-12-24 | 2019-05-24 | 北京奇艺世纪科技有限公司 | Acquisition methods, device, server and the computer readable storage medium of operational indicator |
CN111352999A (en) * | 2020-03-06 | 2020-06-30 | 九次方大数据信息集团有限公司 | National data circulation and data right confirming method and platform based on block chain |
Non-Patent Citations (1)
Title |
---|
刘冬喜 ; 廖真驰.《数据中心管理系统方案设计》.《办公自动化》.2016,第1-4页. * |
Also Published As
Publication number | Publication date |
---|---|
CN112540989A (en) | 2021-03-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112685385B (en) | Big data platform for smart city construction | |
MacFeely | The big (data) bang: Opportunities and challenges for compiling SDG indicators | |
Perera et al. | Privacy-by-design framework for assessing internet of things applications and platforms | |
Geelan et al. | Enhancing transnational labour solidarity: the unfulfilled promise of the Internet and social media | |
CN112732811A (en) | Data open platform | |
Perera et al. | Twitter analytics: Architecture, tools and analysis | |
CN112540989B (en) | Data right-determining and managing method based on data exchange log | |
CN111885153B (en) | Block chain-based data acquisition method, device, computer equipment and storage medium | |
CN105303455A (en) | Power enterprise user data storage and analysis system | |
CN113986865A (en) | Cross-department service collaboration system and method based on block chain | |
US20180293681A1 (en) | Automated background checks | |
Rolph et al. | Methods for estimating crime rates of individuals | |
Khurshid et al. | Big data-9vs, challenges and solutions | |
Møller et al. | Research data exchange solution | |
Kwon et al. | A spatiotemporal model of twitter information diffusion: An example of egyptian revolution 2011 | |
Hafner et al. | User-focused threat identification for anonymised microdata | |
Zhang | [Retracted] Security Control Strategy of Converged Media Platform UGC Based on Blockchain Technology | |
KR20180131829A (en) | All-round data management device and method supporting long-term ecological research | |
Hu et al. | [Retracted] Internet False News Information Feature Extraction and Screening Based on 5G Internet of Things Combined with Passive RFID | |
CN115392875B (en) | Traditional folk house protection data system and data processing method | |
Read | What the new data protection regulation means for veterinary practices | |
Liu et al. | Full view scenario model of big data governance in community safety service | |
Takan et al. | Fair-News: Digital Journalism Model to Prevent Information Pollution and Manipulation. | |
Sun et al. | Smart City Privacy Protection in Big Data Environment | |
HU et al. | Open source initiatives for big data governance and security: A survey |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |