CN115048363A - Data asset label system construction method and system based on PaaS platform of nuclear power plant - Google Patents

Data asset label system construction method and system based on PaaS platform of nuclear power plant Download PDF

Info

Publication number
CN115048363A
CN115048363A CN202110257015.6A CN202110257015A CN115048363A CN 115048363 A CN115048363 A CN 115048363A CN 202110257015 A CN202110257015 A CN 202110257015A CN 115048363 A CN115048363 A CN 115048363A
Authority
CN
China
Prior art keywords
data
module
power plant
nuclear power
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110257015.6A
Other languages
Chinese (zh)
Inventor
周方禹
汪淑平
朱恋
庄少清
李建池
黄萍
李志昂
曹中才
胡芳
李慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Research Institute of Nuclear Power Operation
China Nuclear Power Operation Technology Corp Ltd
Original Assignee
Research Institute of Nuclear Power Operation
China Nuclear Power Operation Technology Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Research Institute of Nuclear Power Operation, China Nuclear Power Operation Technology Corp Ltd filed Critical Research Institute of Nuclear Power Operation
Priority to CN202110257015.6A priority Critical patent/CN115048363A/en
Publication of CN115048363A publication Critical patent/CN115048363A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The invention aims to provide a data asset tag system construction method and system based on a nuclear power plant PaaS platform. The invention has the beneficial effects that: a label system is constructed by utilizing a computer technology and a machine learning technology, and convenience is provided for further improving the value of data assets by utilizing big data.

Description

Data asset label system construction method and system based on PaaS platform of nuclear power plant
Technical Field
The invention belongs to the field of label system construction in a big data analysis technology, and particularly relates to a data asset label system construction method and system based on a nuclear power plant PaaS platform.
Background
Data assets refer to data resources that are physically or electronically recorded, owned or controlled by an individual or business, and that can bring future economic benefit to the business. Data resources stored in physical or electronic form related to design drawings, patents, papers, business records, business reports, etc. generated during the production and operation of a nuclear power plant are all data assets of the nuclear power plant. How to manage and utilize these data assets to create higher benefits for the operation and production of nuclear power plants is a current challenge. The label system is established for the data assets, the characteristics of the data assets can be depicted from multiple dimensions, and meanwhile a foundation is provided for excavating hidden association relations and potential characteristics from mass data by means of a nuclear power plant PaaS platform and a big data processing technology. At present, due to the particularity of the nuclear power industry, a ready method for constructing a tag system aiming at the data assets of a nuclear power plant is lacked.
The method is limited by the particularity of the data assets of the nuclear power plant, the existing label system construction method is suitable for other fields, and a label system construction method which is directly matched with the data assets of the nuclear power plant is lacked. A label system is established for data assets of a PaaS platform of a nuclear power plant, and on one hand, the data assets can be conveniently managed by utilizing a computer technology in the future. On the other hand, the big data analysis technology is continuously developed and matured, the existing data assets are mined for potential value and association relation by using the big data analysis technology, and a tag system is required to be used for optimizing query and establishing association rules.
Disclosure of Invention
The invention aims to provide a data asset tag system construction method and system based on a nuclear power plant PaaS platform.
The technical scheme of the invention is as follows: a data asset tag system construction method based on a nuclear power plant PaaS platform comprises the following steps:
step 001: extracting original data related to the categories from a PaaS platform of the nuclear power plant according to the preset categories by using a data extraction module, wherein the preset categories can be a data asset file format category, an uploading time category of a data asset file, a unit category of the data asset, a data asset file property category and a data asset content information keyword category;
step 002: the extracted original data information is transmitted to a data processing module, and data cleaning and data merging operations are carried out on the original data to obtain preset categories, corresponding characteristic data and field information contained in the characteristics;
step 003: and importing the processed feature data corresponding to the preset categories and the fields contained in the features into a main feature field selection module. Then, a main characteristic field selection module selects a main field from the field information of each preset category according to the transmitted data and information, and determines a label corresponding to the corresponding preset category according to the characteristic data corresponding to the main field;
step 004: and importing the processed preset feature data corresponding to the categories and the fields corresponding to the features into a secondary feature field selection module. Then, the secondary characteristic field selection module selects a secondary characteristic field from fields contained in the characteristic information corresponding to all preset categories, and transmits the characteristic data corresponding to the secondary characteristic field into the transaction property prediction module;
step 005: the transaction property prediction module predicts the transaction property of the data asset according to the transmitted characteristic data by using the trained softmax classification model;
step 006: according to the labels corresponding to the preset categories and the transaction type prediction labels, a label system of the data assets is established;
step 007: and uploading and storing the established label system of the data assets.
Data asset label system construction side system based on nuclear power plant PaaS platform, its characterized in that: the system comprises a data extraction module, a data processing module, a main characteristic field selection module, an auxiliary characteristic field selection module, a transaction property prediction module, a transaction prediction model training module, a label system establishment module and a label system storage module.
The data extraction module: determining categories of characteristic information of the investigation data assets according to relevant data of operation and management of the nuclear power plant, wherein the categories are collectively called as preset categories and comprise the following steps: safety, production and management, and original data related to data assets are extracted from a data source of a PaaS platform of the nuclear power plant based on the safety, production and management, and then are transmitted to a data processing module.
The data processing module: and under the guidance of a preset category, carrying out data cleaning and data merging operation on original data which are related to the data assets and are transmitted from the data extraction module to obtain characteristic information corresponding to the preset label category, wherein the corresponding characteristic information comprises one or more fields, and then transmitting the special information data to the main characteristic field selection module.
The main characteristic field selection module: for each category of preset tags, selecting a main feature field from fields included in feature information of the tags corresponding to the category, for example: nuclear safety, production safety, power generation conditions, equipment management, and according to the characteristic information contained in this field, the label of this category is determined.
The secondary characteristic field selection module: selecting a secondary field from fields contained in the feature information corresponding to all preset categories, wherein the secondary field contains: data or file source, date of creation of the data or file, and the like. And then, transmitting the characteristic data corresponding to the secondary characteristic field into a transaction property classification module to obtain a transaction property label of the object.
The transaction property prediction module: and predicting a transaction property label corresponding to the target data asset according to a series of transmitted characteristic data by a machine learning method, wherein the transaction property comprises an accident handling file, a production condition record and an invention patent book.
The transaction property prediction model training module is used for: training data and test data are obtained from a PaaS platform of the nuclear power plant, a transaction property prediction model is trained by using the training data, and the test data is used for checking until the prediction accuracy rate meets the requirement.
The label system construction module comprises: and establishing a label system for the data assets according to the labels of the preset categories and the transaction property labels obtained through prediction.
The label system storage module comprises: and uploading the established label system to a corresponding database for storage.
The invention has the beneficial effects that: a label system is constructed by utilizing a computer technology and a machine learning technology, and convenience is provided for further improving the value of data assets by utilizing big data.
Drawings
Fig. 1 is a schematic diagram of a data asset tag system construction system based on a nuclear power plant PaaS platform.
Detailed Description
The invention is further described in detail below with reference to the drawings and specific embodiments.
The method utilizes a computer technology and a machine learning technology to build a label system for the data assets in the PaaS platform of the nuclear power plant, and provides a guidance direction for analyzing and mining the potential value and innovation of the data assets by utilizing a big data technology in the future.
As shown in fig. 1, the data asset tag system builder system based on the PaaS platform of the nuclear power plant includes a data extraction module, a data processing module, a main characteristic field selection module, a sub-characteristic field selection module, a transaction property prediction module, a transaction prediction model training module, a tag system building module, and a tag system storage module.
A data extraction module: determining categories of characteristic information of the investigation data assets according to relevant data of operation and management of the nuclear power plant, wherein the categories are collectively called as preset categories and comprise the following steps: security, production, management, etc. And extracting original data related to the data assets from a data source of the PaaS platform of the nuclear power plant on the basis of the original data, and then transmitting the original data to the data processing module.
A data processing module: and under the guidance of a preset category, carrying out data cleaning and data merging operation on original data which are related to the data assets and are transmitted from the data extraction module to obtain characteristic information corresponding to the preset label category, wherein the corresponding characteristic information comprises one or more fields, and then transmitting the special information data to the main characteristic field selection module.
A main characteristic field selection module: for each category of preset tags, selecting a main feature field from fields included in feature information of the tags corresponding to the category, for example: nuclear safety, production safety, power generation conditions, equipment management, etc., and determines the label of the category based on the characteristic information contained in the field.
A secondary characteristic field selection module: selecting a secondary field from fields contained in the feature information corresponding to all preset categories, wherein the secondary field contains: data or file source, date of creation of the data or file, and the like. And then, transmitting the characteristic data corresponding to the secondary characteristic field into a transaction property classification module to obtain a transaction property label of the object.
The transaction property prediction module: and predicting a transaction property label corresponding to the target data asset according to a series of transmitted characteristic data by a machine learning method, wherein the transaction property comprises an accident handling file, a production condition record, an invention patent book and the like.
The transaction property prediction model training module: training data and test data are obtained from a PaaS platform of a nuclear power plant, a transaction property prediction model is trained by using the training data, and the test data is used for checking until the prediction accuracy rate meets the requirement
The label system construction module: and establishing a label system for the data assets according to the labels of the preset categories and the predicted transaction property labels.
A label system storage module: and uploading the established label system to a corresponding database for storage.
A data asset tag system construction method based on a nuclear power plant PaaS platform comprises the following implementation steps:
step 001: and extracting original data related to the categories from the PaaS platform of the nuclear power plant by using a number extraction module according to the preset categories, wherein the preset categories can be a data asset file format category, a data asset file uploading time category, a unit category to which the data asset belongs, a data asset file property category and a data asset content information keyword category.
Step 002: and transmitting the extracted original data information to a data processing module, and performing data cleaning and data merging operation on the original data to obtain preset categories, corresponding characteristic data and field information contained in the characteristics.
Step 003: and importing the processed feature data corresponding to the preset categories and the fields contained in the features into a main feature field selection module. And then, the main characteristic field selection module selects a main field from the field information of each preset category according to the transmitted data and information. And determining the label corresponding to the preset category corresponding to the main field according to the characteristic data corresponding to the main field.
Step 004: and importing the processed preset feature data corresponding to the categories and the fields corresponding to the features into a secondary feature field selection module. Then, the secondary characteristic field selection module selects a secondary characteristic field from fields contained in the characteristic information corresponding to all preset categories, and transmits the characteristic data corresponding to the secondary characteristic field to the transaction property prediction module.
Step 005: and the transaction property prediction module predicts the transaction property of the data asset according to the incoming characteristic data by using the trained softmax classification model.
(1) The trained softmax classification model comprises a group of weight value vectors and a group of corresponding bias values which are respectively
Figure BDA0002967862720000061
And B ═ B 1 ,b 2 ,…,b k Where k represents the number of categories of the nature of the transaction.
(2) Inputting characteristic data X ═ X 1 ,x 2 ,…,x n Calculating with the weight and the bias to obtain a middle value
Figure BDA0002967862720000071
Wherein
Figure BDA0002967862720000072
Representing a set of vectors consisting of n weight values. The characteristic data may be: the monthly power generation capacity of the nuclear power plant, the normal operation days of the nuclear power plant, the occurrence frequency of safety accidents and the like.
(3) According to the calculation formula of softmax, probability values P { P (y ═ 1| X), P (y ═ 2| X), …, P (y ═ k | X) }, of all possible outcomes in the transaction type to which the data asset belongs can be calculated. The calculation formula is as follows
Figure BDA0002967862720000073
(4) And selecting the maximum probability value from the probability values of all possible transaction types corresponding to the data asset, and then taking the transaction type corresponding to the maximum probability value as the transaction type label of the data asset.
Step 006: and establishing a label system of the data assets according to labels corresponding to the preset categories and the transaction type prediction labels.
Step 007: and uploading and storing the established label system of the data assets.

Claims (10)

1. A data asset tag system construction method based on a PaaS platform of a nuclear power plant is characterized by comprising the following steps:
step 001: extracting original data related to categories from a PaaS platform of the nuclear power plant by using a data extraction module according to the preset categories, wherein the preset categories can be a data asset file format category, a data asset file uploading time category, a unit category to which a data asset belongs, a data asset file property category and a data asset content information keyword category;
step 002: the extracted original data information is transmitted to a data processing module, and data cleaning and data merging operations are carried out on the original data to obtain preset categories, corresponding characteristic data and field information contained in the characteristics;
step 003: importing the processed feature data corresponding to the preset categories and the fields contained in the features into a main feature field selection module, then selecting a main feature field from the field information of each preset category according to the input data and information by the main feature field selection module, and determining the corresponding label of the corresponding preset category according to the feature data corresponding to the main feature field;
step 004: importing the processed feature data corresponding to the preset categories and the fields corresponding to the features into a secondary feature field selection module, and then selecting secondary feature fields from the fields contained in the feature information corresponding to all the preset categories by the secondary feature field selection module and transmitting the feature data corresponding to the secondary feature fields into a transaction property prediction module;
step 005: the transaction property prediction module predicts the transaction property of the data asset according to the transmitted characteristic data by using the trained softmax classification model;
step 006: according to the labels corresponding to the preset categories and the transaction type prediction labels, a label system of the data assets is established;
step 007: and uploading and storing the established label system of the data assets.
2. Data asset label system construction side system based on nuclear power plant PaaS platform, its characterized in that: the system comprises a data extraction module, a data processing module, a main characteristic field selection module, an auxiliary characteristic field selection module, a transaction property prediction module, a transaction prediction model training module, a label system establishment module and a label system storage module.
3. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the data extraction module: determining categories of characteristic information of the investigation data assets according to relevant data of nuclear power plant operation and management, wherein the categories are collectively called preset categories and comprise: safety, production and management, and original data related to data assets are extracted from a data source of a PaaS platform of the nuclear power plant based on the safety, production and management, and then are transmitted to a data processing module.
4. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the data processing module: and under the guidance of a preset category, carrying out data cleaning and data merging operation on original data which are related to the data assets and are transmitted from the data extraction module to obtain characteristic information corresponding to the preset label category, wherein the corresponding characteristic information comprises one or more fields, and then transmitting the special information data to the main characteristic field selection module.
5. The nuclear power plant PaaS platform based data asset tag system builder system of claim 2, wherein:
the main characteristic field selection module: for each category of preset tags, selecting a main feature field from fields included in feature information of the tags corresponding to the category, for example: nuclear safety, production safety, power generation conditions, equipment management, etc., and determines the label of the category based on the characteristic information contained in the field.
6. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the secondary characteristic field selection module: selecting a secondary field from fields contained in the feature information corresponding to all preset categories, wherein the secondary field contains: data or file source, date of creation of the data or file, and the like. And then, transmitting the characteristic data corresponding to the secondary characteristic field into a transaction property classification module to obtain a transaction property label of the object.
7. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the transaction property prediction module: and predicting a transaction property label corresponding to the target data asset by a machine learning method according to a series of transmitted characteristic data, wherein the transaction property comprises an accident handling file, a production condition record and an invention patent book.
8. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the transaction property prediction model training module is used for: training data and test data are obtained from a PaaS platform of the nuclear power plant, a transaction property prediction model is trained by using the training data, and the test data is used for checking until the prediction accuracy rate meets the requirement.
9. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the label system construction module comprises: and establishing a label system for the data assets according to the labels of the preset categories and the predicted transaction property labels.
10. The nuclear power plant PaaS platform based data asset tag system builder system of claim 2, wherein:
the label system storage module comprises: and uploading the established label system to a corresponding database for storage.
CN202110257015.6A 2021-03-09 2021-03-09 Data asset label system construction method and system based on PaaS platform of nuclear power plant Pending CN115048363A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110257015.6A CN115048363A (en) 2021-03-09 2021-03-09 Data asset label system construction method and system based on PaaS platform of nuclear power plant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110257015.6A CN115048363A (en) 2021-03-09 2021-03-09 Data asset label system construction method and system based on PaaS platform of nuclear power plant

Publications (1)

Publication Number Publication Date
CN115048363A true CN115048363A (en) 2022-09-13

Family

ID=83156349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110257015.6A Pending CN115048363A (en) 2021-03-09 2021-03-09 Data asset label system construction method and system based on PaaS platform of nuclear power plant

Country Status (1)

Country Link
CN (1) CN115048363A (en)

Similar Documents

Publication Publication Date Title
CN108563783B (en) Financial analysis management system and method based on big data
CN108648020A (en) User behavior quantization method, system, equipment and storage medium
Ali et al. Resume classification system using natural language processing and machine learning techniques
CN102508860A (en) Data mining method based on XBRL (extensible business reporting language) embodiment document
CN111143838B (en) Database user abnormal behavior detection method
CN102012918A (en) System and method for excavating and executing rule
Matthews et al. Smart data and business analytics: A theoretical framework for managing rework risks in mega-projects
CN112116168B (en) User behavior prediction method and device and electronic equipment
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN116737111B (en) Safety demand analysis method based on scenerization
CN113449072A (en) Construction method of excavator fault knowledge map based on deep learning
CN116757837A (en) Credit wind control method and system applied to winning bid
CN110597796B (en) Big data real-time modeling method and system based on full life cycle
CN115048363A (en) Data asset label system construction method and system based on PaaS platform of nuclear power plant
CN110941952A (en) Method and device for perfecting audit analysis model
CN114139490A (en) Method, device and equipment for automatic data preprocessing
CN113505117A (en) Data quality evaluation method, device, equipment and medium based on data indexes
CN112182069B (en) Agent retention prediction method, agent retention prediction device, computer equipment and storage medium
CN117077005B (en) Optimization method and system for urban micro-update potential
Zhang et al. Multiple science data-oriented Technology Roadmapping method
Song et al. An audit decision aid system for management fraud risk assessment using AHP-CBR
Suadaa et al. Automatic Text Categorization to Standard Classification of Indonesian Business Fields (KBLI) 2020
CN112380321A (en) Primary and secondary database distribution method based on bill knowledge graph and related equipment
CN117435777A (en) Automatic construction method and system for industrial chain map
CN117435730A (en) Text classification method and device for railway dispatching command

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination