CN115934857A - Data asset classification management and storage method suitable for engineering field - Google Patents

Data asset classification management and storage method suitable for engineering field Download PDF

Info

Publication number
CN115934857A
CN115934857A CN202211584645.5A CN202211584645A CN115934857A CN 115934857 A CN115934857 A CN 115934857A CN 202211584645 A CN202211584645 A CN 202211584645A CN 115934857 A CN115934857 A CN 115934857A
Authority
CN
China
Prior art keywords
data
fields
classification
management
data form
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211584645.5A
Other languages
Chinese (zh)
Inventor
梁斌
李天淇
熊浩
陈新喜
吴光辉
郭志鑫
李赟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Eighth Engineering Division Co Ltd
Original Assignee
China Construction Eighth Engineering Division Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Eighth Engineering Division Co Ltd filed Critical China Construction Eighth Engineering Division Co Ltd
Priority to CN202211584645.5A priority Critical patent/CN115934857A/en
Publication of CN115934857A publication Critical patent/CN115934857A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a data asset classification management and storage method applicable to the field of engineering, which comprises the following steps: determining a category of a data source; converting the data into SQL statements; through SQL semantic analysis, an SQL sentence for creating a data form is split into data cataloguing documents of the data form, and source, grading, classification and scope information are added; converting the classification of the classified fields into data form naming, and creating a new data form for cataloguing for automation; extracting data form fields, calculating the similarity of the data fields, and classifying similar results; according to different classifications, creating a form building statement and building data forms of different classifications; and after the data form is created, automatically storing the name of the data form into the corresponding data catalog, and establishing the mapping relation between the catalog and the entity data form. The method solves the problems of scattered and trivial data source tables, realizes the management and control of data assets and exerts the value of data.

Description

Data asset classification management and storage method suitable for engineering field
Technical Field
The invention relates to the technical field of informatization management in the engineering field, in particular to a data asset classification management and storage method suitable for the engineering field.
Background
The building industry enters a stable growth stage, the traditional mode cannot meet the high-quality development requirement of the industry, and transformation and upgrading are imperative. As a contract for the building industry to grasp the convergence development of digitalization, informatization and intellectualization, the development road of the convergence of the building industry and the Internet is clear, the direction of digital technology enabling the high-quality development of the building industry is realized, and the building industry is surely promoted to be comprehensively upgraded to digitalization, informatization and intellectualization.
The general data classification management method in the prior art cannot meet the subdivision requirements of the current building construction industry.
Specifically, the prior art is disadvantageous in that a data classification and storage method with finer granularity is lacking, so that data are stored in a database in time at present, the data are still scattered in each business department, employees need to log in each business system to enter information, leaders also need to log in different systems to examine and examine information, a comprehensive main data management platform is lacking, the data quality is low, effective data analysis cannot be performed, and a decision of a data asset auxiliary manager of an enterprise cannot be formed.
Disclosure of Invention
Aiming at the problems, the data asset classification management and storage method applicable to the engineering field provided by the invention solves the problems that the data quality of the building construction industry is low and effective data analysis cannot be carried out, and provides a data classification and storage method with fine granularity to assist analysis decision.
The invention is realized by the following technical scheme:
a data asset classification management and storage method suitable for the engineering field comprises the following steps:
firstly, determining the types of data sources according to different business processes, and marking the different business processes;
secondly, converting the data into SQL sentences for creating a data form capable of automatically extracting the SQL sentences and the form names;
thirdly, through SQL semantic analysis, the SQL sentence for creating the data form is divided into data cataloguing documents of the data form, and source, grading, classification and scope information are added; processing the data cataloguing document into a data form, classifying and labeling scope, storing the standard of the filling content of the data field in a regular expression mode, and storing the standard in a database;
fourthly, after the labeling work is finished, converting classification of the classified fields into data form naming, and creating a new data form for automatic cataloguing;
fifthly, extracting data form fields, calculating the similarity of the data fields, and classifying similar results;
sixthly, establishing a table building statement according to different classifications, and establishing data forms of different classifications;
and seventhly, after the data form is created, automatically storing the name of the data form into the corresponding data catalog, and establishing the mapping relation between the catalog and the entity data form.
In an embodiment of the present invention, the categories of data sources in the first step are classified into artificial data, management data, evaluation data, process data, and result data.
In an embodiment of the present invention, the converting of classifying the classified field into data form naming in the fourth step includes:
the classification fields are stored in an XX.XX.XX.XX, and the levels of classification are marked by using a separator in a right word;
and converting the hierarchy naming into the naming of the data table, wherein the naming embodies hierarchy management.
In the embodiment of the present invention, the similar results are classified into three types in the fifth step: reference data, homologous data, and entity data.
In the embodiment of the invention, for the data form created by the reference data in the sixth step, the first form is selected as an entity to build a table, and other fields are all associated with the fields of the first form by external keys; for a data form created by homologous data, establishing a table according to an entity mode, and simultaneously establishing an additional association relation table for storing similar fields and similarity numbers; for entity data, a data form is created directly.
By adopting the technical scheme, the invention has the following beneficial effects:
on the basis of fine granularity management of data classification, the subdivision dimensionality of data is enriched, standard management of data asset cataloging is increased, meanwhile, the method for corresponding the standard of the asset cataloging to the database storage forms a unique technical scheme that the engineering field is more subdivided, and cataloging and storage can be separately managed.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a model architecture diagram of a data asset classification management and storage method suitable for the engineering field according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Prior to this, the following resolution was provided for terms appearing in the text:
SQL statement: a Structured Query Language (Structured Query Language), which is a database Query and programming Language for accessing data and querying, updating, and managing a relational database system; in short, SQL statements are a language in which to operate on a database.
Cataloguing: is the process of establishing a database connection from a client to a server, either locally or remotely. The purpose is to obtain inventory information, i.e., to generate a catalog for accessing the database. The system database directory contains a list and pointers through which DB2 (relational database management system) can find known databases, whether they are on a local system or a remote system.
SQL semantic analysis: the semantic analysis is a logic stage of the SQL parsing process, and the main task is to perform context-related property examination on the basis of correct syntax, complete the legality judgment of elements such as table names, operators and types in the SQL parsing process, and detect semantic ambiguity.
Text similarity calculation: providing semantic similarity calculation capability between two short texts, wherein the output similarity is a real numerical value between 0 and 1, and the larger the output numerical value is, the higher the semantic similarity is represented; in this embodiment, the text similarity may be calculated by using TF-IDF, and the steps are as follows:
finding out key words of two articles by using TF-IDF algorithm
1. Finding out key words of the two articles by using a TF-IDF algorithm;
2. taking out a plurality of keywords (for example, 20 keywords) from each article, combining the keywords into a set, and calculating the word frequency of each article for the words in the set (to avoid the difference of the article lengths, the relative word frequency can be used);
3. generating respective word frequency vectors of the two articles;
4. and calculating cosine similarity of the two vectors, wherein the larger the value is, the more similar the two vectors are.
Project management informatization is widely used in the engineering field, a large amount of structured and unstructured data are generated in links from project establishment, design, construction, purchase, material management to acceptance, but the problem is that information can be conveniently taken and stored according to the structure of the project information in a computer. If the application software requires storage, the information is too dispersed, an information island is easy to form, and if the information is convenient to take for design, the problem of compatibility with the application software can occur.
Aiming at the problems, the invention designs a data asset classification mode and a storage and taking method suitable for the engineering field, which realize secondary cataloguing of stored data, integration of the concept of data capitalization, construction of data standards, classification modes and storage and taking methods based on SQL sentences of a database, and effectively solve the defects of unclear data resources, non-uniform data standards, insufficient visualization degree and the like in the conventional method. The data utilization rate is improved, and the data use difficulty is reduced.
Referring to fig. 1, the data asset classification management and storage method applicable to the engineering field of the present invention mainly includes the following steps:
firstly, determining the types of data sources according to different business processes, and marking the different business processes; in this embodiment, the data sources are mainly classified into artificial data, management data, evaluation data, process data, and result data.
And secondly, the data are embodied in the database as a single form, and the data are converted into SQL statements for creating a data form capable of automatically extracting the SQL statements and the form names.
Thirdly, through SQL semantic analysis, the SQL sentence for creating the data form is divided into an original form of the data form (becoming a data cataloging document), and source, grading, classification and scope information are added; processing the data cataloguing document into a data form, classifying and labeling scope, storing the standard of the filling content of the data field in a regular expression mode, and storing the standard in a database; this step implements the process of automatically creating catalogued documents and restoring data sheets in a programmatic manner.
And fourthly, after the labeling work is finished, firstly, converting the classification of the classified fields into the naming of the data form. The category field is stored in the xx.xx.xx.xx.xx format, with ". As a separator, labeling the hierarchy of the category. The conversion from the hierarchical naming to the naming of the data table is realized, and the naming embodies the hierarchical management such as the data table 1.1 and the data table 1.1.1. This step catalogs the new data sheet for automation.
And fifthly, extracting data form fields, calculating the similarity of all the data fields by a text similarity method (using an open source algorithm), and dividing similar results into three categories, namely 100%,60% -99% and 0-59%. Classified as reference data, homologous data, and entity data, respectively.
And sixthly, automatically creating a table building statement according to three different classifications. The data table is referred, the first table is selected as an entity table, and other fields are all related to the fields of the table by external keys. And for the table of the homologous data, establishing the table according to an entity mode, simultaneously establishing an additional incidence relation table, and storing similar four segments and similarity numbers. And for the entity data table, directly establishing the table.
And seventhly, after the data form is created, automatically storing the name of the data form into the corresponding data catalog, and establishing the mapping relation between the catalog and the entity data form.
Through the list document of the built table, the conversion relation from the original table to the final storage table can be found, the corresponding relation between the physical storage table and the original table can be reversely restored, and differential data investigation is facilitated. Meanwhile, data of data health degree can be output, and operation and maintenance are facilitated.
From the foregoing it will be seen that this invention is one well adapted to attain all the ends and objects set forth above, together with other advantages which are obvious and which are inherent to the structure. It will be understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations. This is within the scope of the invention.

Claims (5)

1. A data asset classification management and storage method suitable for the engineering field is characterized by comprising the following steps:
firstly, determining the types of data sources according to different business processes, and marking the different business processes;
secondly, converting the data into SQL sentences for creating a data form capable of automatically extracting the SQL sentences and the form names;
thirdly, through SQL semantic analysis, the SQL sentence for creating the data form is divided into data cataloguing documents of the data form, and source, grading, classification and scope information are added; processing the data cataloguing document into a data form, classifying and labeling scope, storing the standard of the filling content of the data field in a regular expression mode, and storing the standard in a database;
fourthly, after the labeling work is finished, converting classification of the classified fields into data form naming, and creating a new data form for automatic cataloguing;
fifthly, extracting data form fields, calculating the similarity of the data fields, and classifying similar results;
sixthly, establishing a table building statement according to different classifications, and establishing data forms of different classifications;
and seventhly, after the data form is created, automatically storing the name of the data form into the corresponding data catalog, and establishing the mapping relation between the catalog and the entity data form.
2. The method as claimed in claim 1, wherein the data assets classification management and storage method in the engineering field is characterized in that the data sources in the first step are classified into artificial data, management data, evaluation data, process data and result data.
3. The method for classified management and storage of data assets in engineering field as claimed in claim 1, wherein the step four of converting the classified fields into data form names includes:
the classification field is stored in a XX.XX.XX.XX format, and the classification level is marked by using the'. As a separator;
and converting the hierarchy naming into the naming of the data table, wherein the naming embodies hierarchy management.
4. The method for classified management and storage of data assets in engineering field as claimed in claim 1, wherein the fifth step classifies the similar results into three categories: reference data, homologous data, and entity data.
5. The data asset classification management and storage method applicable to the engineering field of claim 4, wherein in the sixth step, for the data form created by referring to the data, the first form is selected as an entity build table, and other fields are all associated with fields of the first form by foreign keys; for a data form created by homologous data, establishing a table according to an entity mode, and simultaneously establishing an additional association relation table for storing similar fields and similarity numbers; for entity data, a data form is created directly.
CN202211584645.5A 2022-12-09 2022-12-09 Data asset classification management and storage method suitable for engineering field Pending CN115934857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211584645.5A CN115934857A (en) 2022-12-09 2022-12-09 Data asset classification management and storage method suitable for engineering field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211584645.5A CN115934857A (en) 2022-12-09 2022-12-09 Data asset classification management and storage method suitable for engineering field

Publications (1)

Publication Number Publication Date
CN115934857A true CN115934857A (en) 2023-04-07

Family

ID=86655332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211584645.5A Pending CN115934857A (en) 2022-12-09 2022-12-09 Data asset classification management and storage method suitable for engineering field

Country Status (1)

Country Link
CN (1) CN115934857A (en)

Similar Documents

Publication Publication Date Title
CN111708773B (en) Multi-source scientific and creative resource data fusion method
WO2021226809A1 (en) Method and system for constructing knowledge map of manufacturing field
US7685106B2 (en) Sharing of full text index entries across application boundaries
CN109947921B (en) Intelligent question-answering system based on natural language processing
CN110597870A (en) Enterprise relation mining method
CN113987212A (en) Knowledge graph construction method for process data in numerical control machining field
CN111680029B (en) Optimization management method based on standard falling marks of data standard system
CN106897437B (en) High-order rule multi-classification method and system of knowledge system
CN116361487A (en) Multi-source heterogeneous policy knowledge graph construction and storage method and system
CN109871473A (en) A kind of method of pair of project file and Database full-text search document
CN115757810A (en) Method for constructing standard ontology of knowledge graph
CN115422155A (en) Modeling method of data lake metadata model
WO2022252014A1 (en) Method for intelligently matching supply and demand in innovation and entrepreneurship services
US8032521B2 (en) Managing structured content stored as a binary large object (BLOB)
CN115934857A (en) Data asset classification management and storage method suitable for engineering field
CN114880483A (en) Metadata knowledge graph construction method, storage medium and system
CN115934969A (en) Construction method of immovable cultural relic risk assessment knowledge graph
CN113742498B (en) Knowledge graph construction and updating method
CN115937881A (en) Method for automatically identifying content of knowledge graph construction standard form
US20220083736A1 (en) Information processing apparatus and non-transitory computer readable medium
CN105447616A (en) Knowledge management system based on multidimensional classification and full-text retrieval
El Haddadi et al. Mining unstructured data for a competitive intelligence system XEW
CN113868322B (en) Semantic structure analysis method, device and equipment, virtualization system and medium
Lima et al. Building Geospatial Ontologies From Geographic Database Schemas In Peer Data Management Systems.
Lypak et al. Linguistic support for the formation of a consolidated information resource of social memory institutions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination