CN115048363A - Data asset label system construction method and system based on PaaS platform of nuclear power plant - Google Patents
Data asset label system construction method and system based on PaaS platform of nuclear power plant Download PDFInfo
- Publication number
- CN115048363A CN115048363A CN202110257015.6A CN202110257015A CN115048363A CN 115048363 A CN115048363 A CN 115048363A CN 202110257015 A CN202110257015 A CN 202110257015A CN 115048363 A CN115048363 A CN 115048363A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- power plant
- nuclear power
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Abstract
The invention aims to provide a data asset tag system construction method and system based on a nuclear power plant PaaS platform. The invention has the beneficial effects that: a label system is constructed by utilizing a computer technology and a machine learning technology, and convenience is provided for further improving the value of data assets by utilizing big data.
Description
Technical Field
The invention belongs to the field of label system construction in a big data analysis technology, and particularly relates to a data asset label system construction method and system based on a nuclear power plant PaaS platform.
Background
Data assets refer to data resources that are physically or electronically recorded, owned or controlled by an individual or business, and that can bring future economic benefit to the business. Data resources stored in physical or electronic form related to design drawings, patents, papers, business records, business reports, etc. generated during the production and operation of a nuclear power plant are all data assets of the nuclear power plant. How to manage and utilize these data assets to create higher benefits for the operation and production of nuclear power plants is a current challenge. The label system is established for the data assets, the characteristics of the data assets can be depicted from multiple dimensions, and meanwhile a foundation is provided for excavating hidden association relations and potential characteristics from mass data by means of a nuclear power plant PaaS platform and a big data processing technology. At present, due to the particularity of the nuclear power industry, a ready method for constructing a tag system aiming at the data assets of a nuclear power plant is lacked.
The method is limited by the particularity of the data assets of the nuclear power plant, the existing label system construction method is suitable for other fields, and a label system construction method which is directly matched with the data assets of the nuclear power plant is lacked. A label system is established for data assets of a PaaS platform of a nuclear power plant, and on one hand, the data assets can be conveniently managed by utilizing a computer technology in the future. On the other hand, the big data analysis technology is continuously developed and matured, the existing data assets are mined for potential value and association relation by using the big data analysis technology, and a tag system is required to be used for optimizing query and establishing association rules.
Disclosure of Invention
The invention aims to provide a data asset tag system construction method and system based on a nuclear power plant PaaS platform.
The technical scheme of the invention is as follows: a data asset tag system construction method based on a nuclear power plant PaaS platform comprises the following steps:
step 001: extracting original data related to the categories from a PaaS platform of the nuclear power plant according to the preset categories by using a data extraction module, wherein the preset categories can be a data asset file format category, an uploading time category of a data asset file, a unit category of the data asset, a data asset file property category and a data asset content information keyword category;
step 002: the extracted original data information is transmitted to a data processing module, and data cleaning and data merging operations are carried out on the original data to obtain preset categories, corresponding characteristic data and field information contained in the characteristics;
step 003: and importing the processed feature data corresponding to the preset categories and the fields contained in the features into a main feature field selection module. Then, a main characteristic field selection module selects a main field from the field information of each preset category according to the transmitted data and information, and determines a label corresponding to the corresponding preset category according to the characteristic data corresponding to the main field;
step 004: and importing the processed preset feature data corresponding to the categories and the fields corresponding to the features into a secondary feature field selection module. Then, the secondary characteristic field selection module selects a secondary characteristic field from fields contained in the characteristic information corresponding to all preset categories, and transmits the characteristic data corresponding to the secondary characteristic field into the transaction property prediction module;
step 005: the transaction property prediction module predicts the transaction property of the data asset according to the transmitted characteristic data by using the trained softmax classification model;
step 006: according to the labels corresponding to the preset categories and the transaction type prediction labels, a label system of the data assets is established;
step 007: and uploading and storing the established label system of the data assets.
Data asset label system construction side system based on nuclear power plant PaaS platform, its characterized in that: the system comprises a data extraction module, a data processing module, a main characteristic field selection module, an auxiliary characteristic field selection module, a transaction property prediction module, a transaction prediction model training module, a label system establishment module and a label system storage module.
The data extraction module: determining categories of characteristic information of the investigation data assets according to relevant data of operation and management of the nuclear power plant, wherein the categories are collectively called as preset categories and comprise the following steps: safety, production and management, and original data related to data assets are extracted from a data source of a PaaS platform of the nuclear power plant based on the safety, production and management, and then are transmitted to a data processing module.
The data processing module: and under the guidance of a preset category, carrying out data cleaning and data merging operation on original data which are related to the data assets and are transmitted from the data extraction module to obtain characteristic information corresponding to the preset label category, wherein the corresponding characteristic information comprises one or more fields, and then transmitting the special information data to the main characteristic field selection module.
The main characteristic field selection module: for each category of preset tags, selecting a main feature field from fields included in feature information of the tags corresponding to the category, for example: nuclear safety, production safety, power generation conditions, equipment management, and according to the characteristic information contained in this field, the label of this category is determined.
The secondary characteristic field selection module: selecting a secondary field from fields contained in the feature information corresponding to all preset categories, wherein the secondary field contains: data or file source, date of creation of the data or file, and the like. And then, transmitting the characteristic data corresponding to the secondary characteristic field into a transaction property classification module to obtain a transaction property label of the object.
The transaction property prediction module: and predicting a transaction property label corresponding to the target data asset according to a series of transmitted characteristic data by a machine learning method, wherein the transaction property comprises an accident handling file, a production condition record and an invention patent book.
The transaction property prediction model training module is used for: training data and test data are obtained from a PaaS platform of the nuclear power plant, a transaction property prediction model is trained by using the training data, and the test data is used for checking until the prediction accuracy rate meets the requirement.
The label system construction module comprises: and establishing a label system for the data assets according to the labels of the preset categories and the transaction property labels obtained through prediction.
The label system storage module comprises: and uploading the established label system to a corresponding database for storage.
The invention has the beneficial effects that: a label system is constructed by utilizing a computer technology and a machine learning technology, and convenience is provided for further improving the value of data assets by utilizing big data.
Drawings
Fig. 1 is a schematic diagram of a data asset tag system construction system based on a nuclear power plant PaaS platform.
Detailed Description
The invention is further described in detail below with reference to the drawings and specific embodiments.
The method utilizes a computer technology and a machine learning technology to build a label system for the data assets in the PaaS platform of the nuclear power plant, and provides a guidance direction for analyzing and mining the potential value and innovation of the data assets by utilizing a big data technology in the future.
As shown in fig. 1, the data asset tag system builder system based on the PaaS platform of the nuclear power plant includes a data extraction module, a data processing module, a main characteristic field selection module, a sub-characteristic field selection module, a transaction property prediction module, a transaction prediction model training module, a tag system building module, and a tag system storage module.
A data extraction module: determining categories of characteristic information of the investigation data assets according to relevant data of operation and management of the nuclear power plant, wherein the categories are collectively called as preset categories and comprise the following steps: security, production, management, etc. And extracting original data related to the data assets from a data source of the PaaS platform of the nuclear power plant on the basis of the original data, and then transmitting the original data to the data processing module.
A data processing module: and under the guidance of a preset category, carrying out data cleaning and data merging operation on original data which are related to the data assets and are transmitted from the data extraction module to obtain characteristic information corresponding to the preset label category, wherein the corresponding characteristic information comprises one or more fields, and then transmitting the special information data to the main characteristic field selection module.
A main characteristic field selection module: for each category of preset tags, selecting a main feature field from fields included in feature information of the tags corresponding to the category, for example: nuclear safety, production safety, power generation conditions, equipment management, etc., and determines the label of the category based on the characteristic information contained in the field.
A secondary characteristic field selection module: selecting a secondary field from fields contained in the feature information corresponding to all preset categories, wherein the secondary field contains: data or file source, date of creation of the data or file, and the like. And then, transmitting the characteristic data corresponding to the secondary characteristic field into a transaction property classification module to obtain a transaction property label of the object.
The transaction property prediction module: and predicting a transaction property label corresponding to the target data asset according to a series of transmitted characteristic data by a machine learning method, wherein the transaction property comprises an accident handling file, a production condition record, an invention patent book and the like.
The transaction property prediction model training module: training data and test data are obtained from a PaaS platform of a nuclear power plant, a transaction property prediction model is trained by using the training data, and the test data is used for checking until the prediction accuracy rate meets the requirement
The label system construction module: and establishing a label system for the data assets according to the labels of the preset categories and the predicted transaction property labels.
A label system storage module: and uploading the established label system to a corresponding database for storage.
A data asset tag system construction method based on a nuclear power plant PaaS platform comprises the following implementation steps:
step 001: and extracting original data related to the categories from the PaaS platform of the nuclear power plant by using a number extraction module according to the preset categories, wherein the preset categories can be a data asset file format category, a data asset file uploading time category, a unit category to which the data asset belongs, a data asset file property category and a data asset content information keyword category.
Step 002: and transmitting the extracted original data information to a data processing module, and performing data cleaning and data merging operation on the original data to obtain preset categories, corresponding characteristic data and field information contained in the characteristics.
Step 003: and importing the processed feature data corresponding to the preset categories and the fields contained in the features into a main feature field selection module. And then, the main characteristic field selection module selects a main field from the field information of each preset category according to the transmitted data and information. And determining the label corresponding to the preset category corresponding to the main field according to the characteristic data corresponding to the main field.
Step 004: and importing the processed preset feature data corresponding to the categories and the fields corresponding to the features into a secondary feature field selection module. Then, the secondary characteristic field selection module selects a secondary characteristic field from fields contained in the characteristic information corresponding to all preset categories, and transmits the characteristic data corresponding to the secondary characteristic field to the transaction property prediction module.
Step 005: and the transaction property prediction module predicts the transaction property of the data asset according to the incoming characteristic data by using the trained softmax classification model.
(1) The trained softmax classification model comprises a group of weight value vectors and a group of corresponding bias values which are respectivelyAnd B ═ B 1 ,b 2 ,…,b k Where k represents the number of categories of the nature of the transaction.
(2) Inputting characteristic data X ═ X 1 ,x 2 ,…,x n Calculating with the weight and the bias to obtain a middle value
WhereinRepresenting a set of vectors consisting of n weight values. The characteristic data may be: the monthly power generation capacity of the nuclear power plant, the normal operation days of the nuclear power plant, the occurrence frequency of safety accidents and the like.
(3) According to the calculation formula of softmax, probability values P { P (y ═ 1| X), P (y ═ 2| X), …, P (y ═ k | X) }, of all possible outcomes in the transaction type to which the data asset belongs can be calculated. The calculation formula is as follows
(4) And selecting the maximum probability value from the probability values of all possible transaction types corresponding to the data asset, and then taking the transaction type corresponding to the maximum probability value as the transaction type label of the data asset.
Step 006: and establishing a label system of the data assets according to labels corresponding to the preset categories and the transaction type prediction labels.
Step 007: and uploading and storing the established label system of the data assets.
Claims (10)
1. A data asset tag system construction method based on a PaaS platform of a nuclear power plant is characterized by comprising the following steps:
step 001: extracting original data related to categories from a PaaS platform of the nuclear power plant by using a data extraction module according to the preset categories, wherein the preset categories can be a data asset file format category, a data asset file uploading time category, a unit category to which a data asset belongs, a data asset file property category and a data asset content information keyword category;
step 002: the extracted original data information is transmitted to a data processing module, and data cleaning and data merging operations are carried out on the original data to obtain preset categories, corresponding characteristic data and field information contained in the characteristics;
step 003: importing the processed feature data corresponding to the preset categories and the fields contained in the features into a main feature field selection module, then selecting a main feature field from the field information of each preset category according to the input data and information by the main feature field selection module, and determining the corresponding label of the corresponding preset category according to the feature data corresponding to the main feature field;
step 004: importing the processed feature data corresponding to the preset categories and the fields corresponding to the features into a secondary feature field selection module, and then selecting secondary feature fields from the fields contained in the feature information corresponding to all the preset categories by the secondary feature field selection module and transmitting the feature data corresponding to the secondary feature fields into a transaction property prediction module;
step 005: the transaction property prediction module predicts the transaction property of the data asset according to the transmitted characteristic data by using the trained softmax classification model;
step 006: according to the labels corresponding to the preset categories and the transaction type prediction labels, a label system of the data assets is established;
step 007: and uploading and storing the established label system of the data assets.
2. Data asset label system construction side system based on nuclear power plant PaaS platform, its characterized in that: the system comprises a data extraction module, a data processing module, a main characteristic field selection module, an auxiliary characteristic field selection module, a transaction property prediction module, a transaction prediction model training module, a label system establishment module and a label system storage module.
3. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the data extraction module: determining categories of characteristic information of the investigation data assets according to relevant data of nuclear power plant operation and management, wherein the categories are collectively called preset categories and comprise: safety, production and management, and original data related to data assets are extracted from a data source of a PaaS platform of the nuclear power plant based on the safety, production and management, and then are transmitted to a data processing module.
4. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the data processing module: and under the guidance of a preset category, carrying out data cleaning and data merging operation on original data which are related to the data assets and are transmitted from the data extraction module to obtain characteristic information corresponding to the preset label category, wherein the corresponding characteristic information comprises one or more fields, and then transmitting the special information data to the main characteristic field selection module.
5. The nuclear power plant PaaS platform based data asset tag system builder system of claim 2, wherein:
the main characteristic field selection module: for each category of preset tags, selecting a main feature field from fields included in feature information of the tags corresponding to the category, for example: nuclear safety, production safety, power generation conditions, equipment management, etc., and determines the label of the category based on the characteristic information contained in the field.
6. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the secondary characteristic field selection module: selecting a secondary field from fields contained in the feature information corresponding to all preset categories, wherein the secondary field contains: data or file source, date of creation of the data or file, and the like. And then, transmitting the characteristic data corresponding to the secondary characteristic field into a transaction property classification module to obtain a transaction property label of the object.
7. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the transaction property prediction module: and predicting a transaction property label corresponding to the target data asset by a machine learning method according to a series of transmitted characteristic data, wherein the transaction property comprises an accident handling file, a production condition record and an invention patent book.
8. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the transaction property prediction model training module is used for: training data and test data are obtained from a PaaS platform of the nuclear power plant, a transaction property prediction model is trained by using the training data, and the test data is used for checking until the prediction accuracy rate meets the requirement.
9. The nuclear power plant PaaS platform based data asset tag system builder system as claimed in claim 2, wherein:
the label system construction module comprises: and establishing a label system for the data assets according to the labels of the preset categories and the predicted transaction property labels.
10. The nuclear power plant PaaS platform based data asset tag system builder system of claim 2, wherein:
the label system storage module comprises: and uploading the established label system to a corresponding database for storage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110257015.6A CN115048363A (en) | 2021-03-09 | 2021-03-09 | Data asset label system construction method and system based on PaaS platform of nuclear power plant |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110257015.6A CN115048363A (en) | 2021-03-09 | 2021-03-09 | Data asset label system construction method and system based on PaaS platform of nuclear power plant |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115048363A true CN115048363A (en) | 2022-09-13 |
Family
ID=83156349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110257015.6A Pending CN115048363A (en) | 2021-03-09 | 2021-03-09 | Data asset label system construction method and system based on PaaS platform of nuclear power plant |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115048363A (en) |
-
2021
- 2021-03-09 CN CN202110257015.6A patent/CN115048363A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108563783B (en) | Financial analysis management system and method based on big data | |
CN108648020A (en) | User behavior quantization method, system, equipment and storage medium | |
Ali et al. | Resume classification system using natural language processing and machine learning techniques | |
CN102508860A (en) | Data mining method based on XBRL (extensible business reporting language) embodiment document | |
CN111143838B (en) | Database user abnormal behavior detection method | |
CN102012918A (en) | System and method for excavating and executing rule | |
Matthews et al. | Smart data and business analytics: A theoretical framework for managing rework risks in mega-projects | |
CN112116168B (en) | User behavior prediction method and device and electronic equipment | |
CN110310012B (en) | Data analysis method, device, equipment and computer readable storage medium | |
CN116737111B (en) | Safety demand analysis method based on scenerization | |
CN113449072A (en) | Construction method of excavator fault knowledge map based on deep learning | |
CN116757837A (en) | Credit wind control method and system applied to winning bid | |
CN110597796B (en) | Big data real-time modeling method and system based on full life cycle | |
CN115048363A (en) | Data asset label system construction method and system based on PaaS platform of nuclear power plant | |
CN110941952A (en) | Method and device for perfecting audit analysis model | |
CN114139490A (en) | Method, device and equipment for automatic data preprocessing | |
CN113505117A (en) | Data quality evaluation method, device, equipment and medium based on data indexes | |
CN112182069B (en) | Agent retention prediction method, agent retention prediction device, computer equipment and storage medium | |
CN117077005B (en) | Optimization method and system for urban micro-update potential | |
Zhang et al. | Multiple science data-oriented Technology Roadmapping method | |
Song et al. | An audit decision aid system for management fraud risk assessment using AHP-CBR | |
Suadaa et al. | Automatic Text Categorization to Standard Classification of Indonesian Business Fields (KBLI) 2020 | |
CN112380321A (en) | Primary and secondary database distribution method based on bill knowledge graph and related equipment | |
CN117435777A (en) | Automatic construction method and system for industrial chain map | |
CN117435730A (en) | Text classification method and device for railway dispatching command |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |