CN117273670B - Engineering data management system with learning function - Google Patents

Engineering data management system with learning function Download PDF

Info

Publication number
CN117273670B
CN117273670B CN202311567369.6A CN202311567369A CN117273670B CN 117273670 B CN117273670 B CN 117273670B CN 202311567369 A CN202311567369 A CN 202311567369A CN 117273670 B CN117273670 B CN 117273670B
Authority
CN
China
Prior art keywords
data
engineering data
module
engineering
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311567369.6A
Other languages
Chinese (zh)
Other versions
CN117273670A (en
Inventor
丛金亮
樊昊科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yuntu Huaxiang Technology Co ltd
Original Assignee
Shenzhen Yuntu Huaxiang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yuntu Huaxiang Technology Co ltd filed Critical Shenzhen Yuntu Huaxiang Technology Co ltd
Priority to CN202311567369.6A priority Critical patent/CN117273670B/en
Publication of CN117273670A publication Critical patent/CN117273670A/en
Application granted granted Critical
Publication of CN117273670B publication Critical patent/CN117273670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention relates to the field of engineering data management, in particular to an engineering data management system with a learning function. Calculating all feature combination contribution degrees of each feature of the engineering data set by using the SHAP feature evaluation model to obtain the engineering data feature set; establishing a target XGBoost engineering data detection model; acquiring real-time engineering data in construction in a system, and inputting the real-time engineering data subjected to data processing into a target XGBoost engineering data detection model for detection to obtain construction engineering data states; judging the integrity of the real-time engineering data according to the construction engineering data state, if the real-time engineering data is incomplete, transmitting the abnormal data to a server for early warning, and monitoring the updated data of the real-time engineering data. The method can monitor the supplement and modification of abnormal data and missing data, and effectively ensure the integrity and accuracy of the data.

Description

Engineering data management system with learning function
Technical Field
The invention relates to the field of engineering data management, in particular to an engineering data management system with a learning function.
Background
In the engineering construction process, the safety management data and the engineering data are important files reflecting engineering construction results. The engineering data are real, timely, effective and normative, and can provide powerful guarantee for realizing engineering improvement results. The construction materials include a huge number of forms and documents, and are of a great variety, and filling these materials is a very labor-intensive process. Along with the development of information technology, currently, engineering data management software is widely used for assisting in actual work, and the software can complete tasks such as inputting, managing, inquiring, backing up, importing and exporting engineering data documents and forms. However, the auditing and supervision still need to be manually performed, so that a certain data management is missed, engineering data management is not timely, missing, repeated, wrong, irregular and the like, and therefore, how to improve the efficiency and accuracy of engineering data management is a technical problem to be solved at present.
Disclosure of Invention
The invention aims to solve the problems, and designs an engineering data management system with a learning function.
The technical scheme of the invention for achieving the purpose is that in the engineering data management system with the learning function, the engineering data management system comprises the following modules:
the data acquisition module is used for acquiring the in-gear engineering data in the system, and carrying out data preprocessing on the in-gear engineering data to obtain an engineering data set;
the feature extraction module is used for establishing a SHAP feature evaluation model, calculating the contribution degree of a single feature in the data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value, and sequencing various input features according to the absolute value of a target SHAP value to obtain an engineering data feature set;
the model building module is used for building an XGBoost engineering data detection model, sampling the unbalanced sample types in the engineering data feature set by utilizing an SMOTE algorithm, inputting the processed engineering data feature set into the XGBoost engineering data detection model for training, and obtaining a target XGBoost engineering data detection model;
the data detection module is used for acquiring real-time engineering data in construction in the system, inputting the real-time engineering data subjected to data processing into the target XGBoost engineering data detection model for detection, and obtaining construction engineering data states;
the data management module is used for judging the integrity of the real-time engineering data according to the construction engineering data state, and notifying a manager to supplement the real-time engineering data if the real-time engineering data is incomplete;
and the data monitoring module is used for acquiring abnormal data of the real-time engineering data if the real-time engineering data is judged to be abnormal, transmitting the abnormal data to a server for early warning, and monitoring updated data of the real-time engineering data.
Further, in the engineering data management system, the data acquisition module includes an acquisition sub-module, a coding sub-module, and a processing sub-module:
the acquisition sub-module is used for acquiring the in-gear engineering data in the system, and the in-gear engineering data at least comprises: project management data, hydrogeologic data, construction technology data, engineering quality data, financial reporting data, construction drawing data, contract data, construction log data;
the coding sub-module is used for carrying out feature coding on the in-gear engineering data, converting non-numerical data in the in-gear engineering data into numerical characteristic data and obtaining coded engineering data;
and the processing sub-module is used for carrying out data cleaning processing on the abnormal value data and the missing value data in the encoded engineering data to obtain an engineering data set.
Further, in the engineering data management system, the feature extraction module includes an evaluation module establishing unit, a contribution degree defining unit, a SHAP value defining unit, a data weighted average unit, and a model prediction contribution unit:
the evaluation module establishing unit is used for establishing a SHAP feature evaluation model, and calculating the contribution degree of a single feature in the data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value;
a contribution degree definition unit, configured to determine the contribution degree as the influence degree of a feature on the SHAP feature evaluation model prediction value, where the importance of the feature is contributed in model prediction; the SHAP value comprises at least a positive value and a negative value, wherein the positive value indicates that the feature plays a positive role in the increase of the predicted value, and the negative value indicates that the feature plays a negative role in the decrease of the predicted value;
a SHAP value definition unit for determining that the SHAP value is defined asWhereinS represents the feature subset entered by the SHAP feature assessment model,/for>Representing the conditional expectation value of subset S, +.>Representing the desired value of the target variable,/->Expressed as expected value of model prediction, +.>Represented as a sample set;
the data weighted average unit is used for determining that the SHAP value is a weighted average value calculated by sequencing all the features when the SHAP feature evaluation model is a nonlinear model and the input features are related to each other, and combining the calculated conditional expectation with the SHAP value to obtain a target SHAP value;
the model prediction contribution unit is used for analyzing the contribution condition of various input features to the SHAP feature evaluation model prediction result, and sorting the various input features according to the absolute value of the target SHAP value to obtain an engineering data feature set.
Further, in the engineering data management system, the model building module includes a category processing sub-module, a model training sub-module, a model node sub-module, a single-heat encoding sub-module, and a factoring machine sub-module:
the class processing sub-module is used for acquiring an engineering data feature set, and sampling and processing unbalanced sample classes in the engineering data feature set by utilizing an SMOTE algorithm to obtain a differential equalization engineering data set;
the model training sub-module is used for establishing an XGBoost engineering data detection model, inputting the differential equalization engineering data set into the XGBoost engineering data detection model for training, and performing characteristic crossing in a tree splitting mode to obtain a target tree structure of the XGBoost engineering data detection model;
the model node submodule is used for carrying out cross combination between different features on paths from the root node to the leaf nodes of each tree in the target tree structure, the number of the leaf nodes is a new feature number, and the codes of each sample at all the leaf nodes are new sample feature values;
the independent-heat coding sub-module is used for calculating a predicted probability value obtained by each sample at each leaf node of each tree, and carrying out independent-heat coding on the leaf node to which the predicted probability value of each sample belongs to, so as to obtain an engineering data sparse feature matrix;
and the factor decomposition machine sub-module is used for inputting the engineering data sparse feature matrix into an FM factor decomposition machine for training, and classifying the results through a Sigmoid function to obtain a target XGBoost engineering data detection model.
Further, in the engineering material management system, the data detection module includes a processing unit, a detection unit, and a status unit:
the processing unit is used for acquiring real-time engineering data in construction in the system, and performing data processing on the real-time engineering data by utilizing the SHAP characteristic evaluation model to obtain a real-time engineering data characteristic set;
the detection unit is used for inputting the real-time engineering data feature set into the target XGBoost engineering data detection model for detection to obtain a construction engineering data state;
and the state unit is used for determining that the construction engineering data state at least comprises an engineering data incomplete state, an engineering data abnormal state and an engineering data normal state.
Further, in the engineering material management system, the data management module includes a judging sub-module, a status sub-module, a detecting sub-module, and a storing sub-module:
the judging sub-module is used for judging the integrity of the real-time engineering data according to the construction engineering data state, and notifying a manager to supplement the real-time engineering data if the engineering data is in an incomplete state;
the state sub-module is used for determining the incomplete state of the engineering data at least comprising the following steps: contract data loss, construction drawing data loss and construction log data loss;
the detection sub-module is used for detecting the supplemented engineering data, and if the detection is passed, the supplemented engineering data is input into the real-time engineering data to obtain target engineering data;
and the storage sub-module is used for inputting the target engineering data into the database for storage and marking the real-time engineering data as a complete state.
Further, in the engineering data management system, the data monitoring module includes a judging sub-module, an abnormal sub-module, an early warning sub-module and a monitoring sub-module:
the judging sub-module is used for acquiring abnormal data of the real-time engineering data if judging that the real-time engineering data are abnormal;
the abnormality sub-module is used for determining that the engineering data is abnormal and at least comprises: abnormal contract amount of engineering materials, abnormal financial data of engineering materials, abnormal construction log data of engineering materials and abnormal detection data of engineering materials;
the early warning sub-module is used for transmitting the abnormal data to a server for early warning and notifying a manager to update the data within 24 hours, acquiring real-time engineering data after the data update every 24 hours, and acquiring first-time engineering data;
and the monitoring sub-module is used for inputting the first time engineering data into the target XGBoost engineering data detection model for detection, judging whether the first time engineering data is abnormal, generating second early warning information if the first time engineering data is abnormal, and transmitting the second early warning information to a manager.
Further, in the engineering data management system, the engineering data management system further includes the following steps:
acquiring in-gear engineering data in the system, and performing data preprocessing on the in-gear engineering data to obtain an engineering data set;
establishing a SHAP feature evaluation model, calculating the contribution degree of a single feature in a data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value, and sorting various input features according to the absolute value of a target SHAP value to obtain an engineering data feature set;
establishing an XGBoost engineering data detection model, sampling the unbalanced sample types in the engineering data feature set by utilizing an SMOTE algorithm, inputting the processed engineering data feature set into the XGBoost engineering data detection model for training, and obtaining a target XGBoost engineering data detection model;
acquiring real-time engineering data in construction in a system, and inputting the real-time engineering data subjected to data processing into the target XGBoost engineering data detection model for detection to obtain construction engineering data states;
judging the integrity of the real-time engineering data according to the construction engineering data state, and informing a manager to supplement the real-time engineering data if the real-time engineering data is incomplete;
if the real-time engineering data is judged to be abnormal, abnormal data of the real-time engineering data are obtained, the abnormal data are transmitted to a server for early warning, and updated data of the real-time engineering data are monitored.
Further, in the engineering data management system, the engineering data management system further includes the following steps:
establishing a SHAP feature evaluation model, and calculating the contribution degree of a single feature in the data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value;
the contribution degree is the influence degree of the feature on the SHAP feature evaluation model predicted value, and the importance of the feature in model prediction is contributed; the SHAP value comprises at least a positive value and a negative value, wherein the positive value indicates that the feature plays a positive role in the increase of the predicted value, and the negative value indicates that the feature plays a negative role in the decrease of the predicted value;
the SHAP value is defined asWherein S represents the feature subset entered by the SHAP feature evaluation model, ++>Representing the conditional expectation value of subset S, +.>Representing the desired value of the target variable,/->Expressed as expected value of model prediction, +.>Represented as a sample set;
when the SHAP feature evaluation model is a nonlinear model and the input features are related to each other, the SHAP value is a weighted average value calculated by all feature sequences, and the calculated conditional expectation and the SHAP value are combined to obtain a target SHAP value;
and analyzing the contribution condition of various input features to the prediction result of the SHAP feature evaluation model, and sequencing the various input features according to the absolute value of the target SHAP value to obtain an engineering data feature set.
Further, in the engineering data management system, the engineering data management system further includes the following steps:
judging the integrity of the real-time engineering data according to the construction engineering data state, and informing a manager to supplement the real-time engineering data if the engineering data is in an incomplete state;
the incomplete state of the engineering data at least comprises: contract data loss, construction drawing data loss and construction log data loss;
detecting the supplemented engineering data, and if the detection is passed, inputting the supplemented engineering data into the real-time engineering data to obtain target engineering data;
inputting the target engineering data into a database for storage, and marking the real-time engineering data as a complete state.
The method has the advantages that the data acquisition module is used for acquiring the in-gear engineering data in the system, and carrying out data preprocessing on the in-gear engineering data to obtain an engineering data set; the feature extraction module is used for establishing a SHAP feature evaluation model, calculating the contribution degree of a single feature in the data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value, and sequencing various input features according to the absolute value of a target SHAP value to obtain an engineering data feature set; the model building module is used for building an XGBoost engineering data detection model, sampling the unbalanced sample types in the engineering data feature set by utilizing an SMOTE algorithm, inputting the processed engineering data feature set into the XGBoost engineering data detection model for training, and obtaining a target XGBoost engineering data detection model; the data detection module is used for acquiring real-time engineering data in construction in the system, inputting the real-time engineering data subjected to data processing into the target XGBoost engineering data detection model for detection, and obtaining construction engineering data states; the data management module is used for judging the integrity of the real-time engineering data according to the construction engineering data state, and notifying a manager to supplement the real-time engineering data if the real-time engineering data is incomplete; and the data monitoring module is used for acquiring abnormal data of the real-time engineering data if the real-time engineering data is judged to be abnormal, transmitting the abnormal data to a server for early warning, and monitoring updated data of the real-time engineering data. The method can be used for classifying and managing a large amount of engineering data, detecting engineering data, timely early warning if abnormal data exist, timely reminding and supplementing if the engineering data are missing, and monitoring the supplement and modification of the abnormal data and the missing data, so that the integrity and accuracy of the data are effectively ensured.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
FIG. 1 is a diagram illustrating a first embodiment of an engineering data management system with learning function according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a second embodiment of an engineering data management system with learning function according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a third embodiment of an engineering data management system with learning function according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The invention will be described in detail with reference to the accompanying drawings, as shown in fig. 1, an engineering data management system with learning function, the engineering data management system comprises the following modules:
the data acquisition module is used for acquiring the in-gear engineering data in the system, and carrying out data preprocessing on the in-gear engineering data to obtain an engineering data set;
specifically, the embodiment further includes an acquisition sub-module, configured to acquire in-gear engineering data in the system, where the in-gear engineering data at least includes: project management data, hydrogeologic data, construction technology data, engineering quality data, financial reporting data, construction drawing data, contract data, construction log data; the coding sub-module is used for carrying out feature coding on the in-gear engineering data, converting non-numerical data in the in-gear engineering data into numerical characteristic data and obtaining coded engineering data; and the processing sub-module is used for carrying out data cleaning processing on the abnormal value data and the missing value data in the encoded engineering data to obtain an engineering data set.
The feature extraction module is used for establishing a SHAP feature evaluation model, calculating the contribution degree of a single feature in the data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value, and sequencing various input features according to the absolute value of a target SHAP value to obtain an engineering data feature set;
specifically, the embodiment further includes an evaluation module establishing unit, configured to establish a SHAP feature evaluation model, and calculate a contribution degree of a single feature in the dataset to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value; a contribution degree definition unit, configured to determine a contribution degree as an influence degree of a feature on a SHAP feature evaluation model prediction value, where the feature contributes to an importance level in model prediction; SHAP values include at least positive and negative values, a positive value indicating that the feature has a positive effect on an increase in the predicted value and a negative value indicating that the feature has a negative effect on a decrease in the predicted value; SHAP value definition unit for determining definition of SHAP value asWherein S represents the feature subset entered by the SHAP feature evaluation model, < ->Representing the conditional expectation value of subset S, +.>Representing the desired value of the target variable,/->Expressed as expected value of model prediction, +.>Represented as a sample set;
the data weighted average unit is used for determining that the SHAP value is a weighted average value calculated by all feature sequences when the SHAP feature evaluation model is a nonlinear model and the input features are related to each other, and combining the calculated conditional expectation with the SHAP value to obtain a target SHAP value; the model prediction contribution unit is used for analyzing the contribution condition of various input features to the SHAP feature evaluation model prediction result, and sequencing the various input features according to the absolute value of the target SHAP value to obtain the engineering data feature set.
The model building module is used for building an XGBoost engineering data detection model, sampling the unbalanced sample types in the engineering data feature set by utilizing an SMOTE algorithm, inputting the processed engineering data feature set into the XGBoost engineering data detection model for training, and obtaining a target XGBoost engineering data detection model;
specifically, the embodiment also comprises a category processing sub-module, which is used for acquiring the engineering data feature set, processing the unbalanced sample category in the engineering data feature set by utilizing SMOTE algorithm sampling, and differentially equalizing the engineering data set; the model training sub-module is used for establishing an XGBoost engineering data detection model, inputting a differential equalization engineering data set into the XGBoost engineering data detection model for training, and performing characteristic crossing in a tree splitting mode to obtain a target tree structure of the XGBoost engineering data detection model; the model node submodule is used for carrying out cross combination between different features of paths from the root node to the leaf nodes of each tree in the target tree structure, the number of the leaf nodes is a new feature number, and the codes of each sample at all the leaf nodes are new sample feature values; the independent-heat coding sub-module is used for calculating a predicted probability value obtained by each sample at each leaf node of each tree, and carrying out independent-heat coding on the leaf node to which the predicted probability value of each sample belongs to, so as to obtain an engineering data sparse feature matrix; and the factor decomposition machine sub-module is used for inputting the sparse feature matrix of the engineering data into the FM factor decomposition machine for training, and classifying the results through a Sigmoid function to obtain a target XGBoost engineering data detection model.
The data detection module is used for acquiring real-time engineering data in construction in the system, inputting the real-time engineering data subjected to data processing into the target XGBoost engineering data detection model for detection, and obtaining construction engineering data states;
specifically, the embodiment further includes a processing unit, configured to obtain real-time engineering data in construction in the system, and perform data processing on the real-time engineering data by using the SHAP feature evaluation model to obtain a real-time engineering data feature set; the detection unit is used for inputting the real-time engineering data feature set into the target XGBoost engineering data detection model for detection to obtain a construction engineering data state; the state unit is used for determining the construction engineering data state at least comprising an incomplete engineering data state, an abnormal engineering data state and a normal engineering data state.
The data management module is used for judging the integrity of the real-time engineering data according to the construction engineering data state, and notifying a manager to supplement the real-time engineering data if the real-time engineering data is incomplete;
specifically, the embodiment further includes a judging sub-module, configured to judge the integrity of the real-time engineering data according to the construction engineering data state, and if the engineering data is in an incomplete state, notify a manager to supplement the real-time engineering data; the state sub-module is used for determining the incomplete state of the engineering data at least comprising the following steps: contract data loss, construction drawing data loss and construction log data loss; the detection sub-module is used for detecting the supplemented engineering data, and if the detection is passed, the supplemented engineering data is input into the real-time engineering data to obtain target engineering data; and the storage sub-module is used for inputting the target engineering data into the database for storage and marking the real-time engineering data as a complete state.
And the data monitoring module is used for acquiring abnormal data of the real-time engineering data if the real-time engineering data is judged to be abnormal, transmitting the abnormal data to the server for early warning and monitoring updated data of the real-time engineering data.
Specifically, the embodiment further includes a judging sub-module, configured to obtain abnormal data of the real-time engineering data if the real-time engineering data is judged to be abnormal; the abnormality sub-module is used for determining that the engineering data is abnormal and at least comprises: abnormal contract amount of engineering materials, abnormal financial data of engineering materials, abnormal construction log data of engineering materials and abnormal detection data of engineering materials; the early warning sub-module is used for transmitting the abnormal data to the server for early warning and notifying a manager to update the data within 24 hours, acquiring the real-time engineering data after the data update every 24 hours, and obtaining first time engineering data; the monitoring sub-module is used for inputting the first time engineering data into the target XGBoost engineering data detection model for detection, judging whether the first time engineering data is abnormal, if so, generating second early warning information, and transmitting the second early warning information to the manager.
The method has the advantages that the data acquisition module is used for acquiring the in-gear engineering data in the system, and carrying out data preprocessing on the in-gear engineering data to obtain an engineering data set; the feature extraction module is used for establishing a SHAP feature evaluation model, calculating the contribution degree of a single feature in the data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value, and sequencing various input features according to the absolute value of a target SHAP value to obtain an engineering data feature set; the model building module is used for building an XGBoost engineering data detection model, sampling the unbalanced sample types in the engineering data feature set by utilizing an SMOTE algorithm, inputting the processed engineering data feature set into the XGBoost engineering data detection model for training, and obtaining a target XGBoost engineering data detection model; the data detection module is used for acquiring real-time engineering data in construction in the system, inputting the real-time engineering data subjected to data processing into the target XGBoost engineering data detection model for detection, and obtaining construction engineering data states; the data management module is used for judging the integrity of the real-time engineering data according to the construction engineering data state, and notifying a manager to supplement the real-time engineering data if the real-time engineering data is incomplete; and the data monitoring module is used for acquiring abnormal data of the real-time engineering data if the real-time engineering data is judged to be abnormal, transmitting the abnormal data to the server for early warning and monitoring updated data of the real-time engineering data. The method can be used for classifying and managing a large amount of engineering data, detecting engineering data, timely early warning if abnormal data exist, timely reminding and supplementing if the engineering data are missing, and monitoring the supplement and modification of the abnormal data and the missing data, so that the integrity and accuracy of the data are effectively ensured.
In this embodiment, referring to fig. 2, in a second embodiment of an engineering data management system with learning function according to the present invention, the feature extraction module includes an evaluation module building unit, a contribution definition unit, a SHAP value definition unit, a data weighted average unit, and a model prediction contribution unit:
the evaluation module establishing unit is used for establishing a SHAP feature evaluation model, and calculating the contribution degree of a single feature in the data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value;
a contribution degree definition unit, configured to determine a contribution degree as an influence degree of a feature on a SHAP feature evaluation model prediction value, where the feature contributes to an importance level in model prediction; SHAP values include at least positive and negative values, a positive value indicating that the feature has a positive effect on an increase in the predicted value and a negative value indicating that the feature has a negative effect on a decrease in the predicted value;
SHAP value definition unit for determining definition of SHAP value asWherein S represents the feature subset entered by the SHAP feature evaluation model, < ->Representing the conditional expectation of the subset S,representing the desired value of the target variable,/->Expressed as expected value of model prediction, +.>Represented as a sample set;
the data weighted average unit is used for determining that the SHAP value is a weighted average value calculated by all feature sequences when the SHAP feature evaluation model is a nonlinear model and the input features are related to each other, and combining the calculated conditional expectation with the SHAP value to obtain a target SHAP value;
the model prediction contribution unit is used for analyzing the contribution condition of various input features to the SHAP feature evaluation model prediction result, and sequencing the various input features according to the absolute value of the target SHAP value to obtain the engineering data feature set.
In this embodiment, referring to fig. 3, in a third embodiment of an engineering data management system with learning function in the embodiment of the present invention, a model building module includes a category processing sub-module, a model training sub-module, a model node sub-module, a single-heat encoding sub-module, and a factoring machine sub-module:
the class processing sub-module is used for acquiring the engineering data feature set, and utilizing the SMOTE algorithm to sample the unbalanced sample class in the engineering data feature set so as to acquire a differential equalization engineering data set;
the model training sub-module is used for establishing an XGBoost engineering data detection model, inputting a differential equalization engineering data set into the XGBoost engineering data detection model for training, and performing characteristic crossing in a tree splitting mode to obtain a target tree structure of the XGBoost engineering data detection model;
the model node submodule is used for carrying out cross combination between different features of paths from the root node to the leaf nodes of each tree in the target tree structure, the number of the leaf nodes is a new feature number, and the codes of each sample at all the leaf nodes are new sample feature values;
the independent-heat coding sub-module is used for calculating a predicted probability value obtained by each sample at each leaf node of each tree, and carrying out independent-heat coding on the leaf node to which the predicted probability value of each sample belongs to, so as to obtain an engineering data sparse feature matrix;
and the factor decomposition machine sub-module is used for inputting the sparse feature matrix of the engineering data into the FM factor decomposition machine for training, and classifying the results through a Sigmoid function to obtain a target XGBoost engineering data detection model.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (1)

1. An engineering data management system with a learning function is characterized by comprising the following modules:
the data acquisition module is used for acquiring the in-gear engineering data in the system, and carrying out data preprocessing on the in-gear engineering data to obtain an engineering data set;
the feature extraction module is used for establishing a SHAP feature evaluation model, calculating the contribution degree of a single feature in the data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value, and sequencing various input features according to the absolute value of a target SHAP value to obtain an engineering data feature set;
the model building module is used for building an XGBoost engineering data detection model, sampling the unbalanced sample types in the engineering data feature set by utilizing an SMOTE algorithm, inputting the processed engineering data feature set into the XGBoost engineering data detection model for training, and obtaining a target XGBoost engineering data detection model;
the data detection module is used for acquiring real-time engineering data in construction in the system, inputting the real-time engineering data subjected to data processing into the target XGBoost engineering data detection model for detection, and obtaining construction engineering data states;
the data management module is used for judging the integrity of the real-time engineering data according to the construction engineering data state, and notifying a manager to supplement the real-time engineering data if the real-time engineering data is incomplete;
the data monitoring module is used for acquiring abnormal data of the real-time engineering data if the real-time engineering data are judged to be abnormal, transmitting the abnormal data to a server for early warning, and monitoring updated data of the real-time engineering data;
the data acquisition module comprises an acquisition sub-module, a coding sub-module and a processing sub-module:
the acquisition sub-module is used for acquiring the in-gear engineering data in the system, and the in-gear engineering data at least comprises: project management profile data, hydrogeologic profile data, construction technology profile data, engineering quality profile data, financial reporting profile data, construction drawing profile data, contract profile data, and construction log profile data;
the coding sub-module is used for carrying out feature coding on the in-gear engineering data, converting non-numerical data in the in-gear engineering data into numerical characteristic data and obtaining coded engineering data;
the processing sub-module is used for carrying out data cleaning processing on the abnormal value data and the missing value data in the encoded engineering data to obtain an engineering data set;
the feature extraction module comprises an evaluation module establishment unit, a contribution degree definition unit, a SHAP value definition unit, a data weighted average unit and a model prediction contribution unit:
the evaluation module establishing unit is used for establishing a SHAP feature evaluation model, and calculating the contribution degree of a single feature in the data set to a model prediction result by using the SHAP feature evaluation model to obtain a SHAP value;
a contribution degree definition unit, configured to determine the contribution degree as the influence degree of a feature on the SHAP feature evaluation model prediction value, where the importance of the feature is contributed in model prediction; the SHAP value comprises at least a positive value and a negative value, wherein the positive value indicates that the feature plays a positive role in the increase of the predicted value, and the negative value indicates that the feature plays a negative role in the decrease of the predicted value;
a SHAP value definition unit for determining that the SHAP value is defined asWherein S represents the feature subset entered by the SHAP feature evaluation model, ++>Representing the conditional expectation value of subset S, +.>Representing the desired value of the target variable,/->Expressed as expected value of model prediction, +.>Represented as a sample set;
the data weighted average unit is used for determining that the SHAP value is a weighted average value calculated by sequencing all the features when the SHAP feature evaluation model is a nonlinear model and the input features are related to each other, and combining the calculated condition expected value and the SHAP value to obtain a target SHAP value;
the model prediction contribution unit is used for analyzing the contribution condition of various input features to the SHAP feature evaluation model prediction result, and sorting the various input features according to the absolute value of the target SHAP value to obtain an engineering data feature set;
the model building module comprises a category processing sub-module, a model training sub-module, a model node sub-module, a single-heat coding sub-module and a factoring machine sub-module:
the class processing sub-module is used for acquiring an engineering data feature set, and sampling and processing unbalanced sample classes in the engineering data feature set by utilizing an SMOTE algorithm to obtain a differential equalization engineering data set;
the model training sub-module is used for establishing an XGBoost engineering data detection model, inputting the differential equalization engineering data set into the XGBoost engineering data detection model for training, and performing characteristic crossing in a tree splitting mode to obtain a target tree structure of the XGBoost engineering data detection model;
the model node submodule is used for carrying out cross combination between different features on paths from the root node to the leaf nodes of each tree in the target tree structure, the number of the leaf nodes is a new feature number, and the codes of each sample at all the leaf nodes are new sample feature values;
the independent-heat coding sub-module is used for calculating a predicted probability value obtained by each sample at each leaf node of each tree, and carrying out independent-heat coding on the leaf node to which the predicted probability value of each sample belongs to, so as to obtain an engineering data sparse feature matrix;
the factor decomposition machine sub-module is used for inputting the sparse feature matrix of the engineering data into an FM factor decomposition machine for training, classifying the results through a Sigmoid function, and obtaining a target XGBoost engineering data detection model;
the data detection module comprises a processing unit, a detection unit and a state unit:
the processing unit is used for acquiring real-time engineering data in construction in the system, and performing data processing on the real-time engineering data by utilizing the SHAP characteristic evaluation model to obtain a real-time engineering data characteristic set;
the detection unit is used for inputting the real-time engineering data feature set into the target XGBoost engineering data detection model for detection to obtain a construction engineering data state;
the state unit is used for determining that the construction engineering data state at least comprises an incomplete engineering data state, an abnormal engineering data state and a normal engineering data state;
the data management module comprises a judging sub-module, a state sub-module, a detecting sub-module and a storage sub-module:
the judging sub-module is used for judging the integrity of the real-time engineering data according to the construction engineering data state, and notifying a manager to supplement the real-time engineering data if the engineering data is in an incomplete state;
the state sub-module is used for determining the incomplete state of the engineering data, and the incomplete state at least comprises the following components: contract data loss, construction drawing data loss and construction log data loss;
the detection sub-module is used for detecting the supplemented engineering data, and if the detection is passed, the supplemented engineering data is input into the real-time engineering data to obtain target engineering data;
the storage sub-module is used for inputting target engineering data into a database for storage and marking the real-time engineering data as a complete state;
the data monitoring module comprises a judging sub-module, an abnormal sub-module, an early warning sub-module and a monitoring sub-module:
the judging sub-module is used for acquiring abnormal data of the real-time engineering data if judging that the real-time engineering data are abnormal;
the abnormality sub-module is used for determining that the engineering data is abnormal and at least comprises: abnormal contract amount of engineering materials, abnormal financial data of engineering materials, abnormal construction log data of engineering materials and abnormal detection data of engineering materials;
the early warning sub-module is used for transmitting the abnormal data to a server for early warning and notifying a manager to update the data within 24 hours, acquiring the real-time engineering data after the data update every 24 hours, and acquiring first real-time engineering data;
and the monitoring sub-module is used for inputting the first real-time engineering data into the target XGBoost engineering data detection model for detection, judging whether the first real-time engineering data is abnormal, generating second early warning information if the first real-time engineering data is abnormal, and transmitting the second early warning information to a manager.
CN202311567369.6A 2023-11-23 2023-11-23 Engineering data management system with learning function Active CN117273670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311567369.6A CN117273670B (en) 2023-11-23 2023-11-23 Engineering data management system with learning function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311567369.6A CN117273670B (en) 2023-11-23 2023-11-23 Engineering data management system with learning function

Publications (2)

Publication Number Publication Date
CN117273670A CN117273670A (en) 2023-12-22
CN117273670B true CN117273670B (en) 2024-03-12

Family

ID=89218230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311567369.6A Active CN117273670B (en) 2023-11-23 2023-11-23 Engineering data management system with learning function

Country Status (1)

Country Link
CN (1) CN117273670B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160824A (en) * 2019-12-05 2020-05-15 深圳市铭华航电工艺技术有限公司 Engineering data detection method and device, computer equipment and storage medium
CN111325353A (en) * 2020-02-28 2020-06-23 深圳前海微众银行股份有限公司 Method, device, equipment and storage medium for calculating contribution of training data set
CN116663962A (en) * 2023-04-26 2023-08-29 合肥天秤检测科技有限公司 Be used for hydraulic engineering dyke material quality detection analysis system
CN117056834A (en) * 2023-08-18 2023-11-14 上海墅字科技有限公司 Big data analysis method based on decision tree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2021221978A1 (en) * 2020-02-17 2022-09-15 DataRobot, Inc. Automated data analytics methods for non-tabular data, and related systems and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160824A (en) * 2019-12-05 2020-05-15 深圳市铭华航电工艺技术有限公司 Engineering data detection method and device, computer equipment and storage medium
CN111325353A (en) * 2020-02-28 2020-06-23 深圳前海微众银行股份有限公司 Method, device, equipment and storage medium for calculating contribution of training data set
CN116663962A (en) * 2023-04-26 2023-08-29 合肥天秤检测科技有限公司 Be used for hydraulic engineering dyke material quality detection analysis system
CN117056834A (en) * 2023-08-18 2023-11-14 上海墅字科技有限公司 Big data analysis method based on decision tree

Also Published As

Publication number Publication date
CN117273670A (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN115578015B (en) Sewage treatment whole process supervision method, system and storage medium based on Internet of things
CN111506478A (en) Method for realizing alarm management control based on artificial intelligence
CN109656793A (en) A kind of information system performance stereoscopic monitoring method based on multi-source heterogeneous data fusion
CN110851321A (en) Service alarm method, equipment and storage medium
CN110636066B (en) Network security threat situation assessment method based on unsupervised generative reasoning
CN112766429B (en) Method, device, computer equipment and medium for anomaly detection
CN114201374B (en) Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning
CN112422351A (en) Network alarm prediction model establishing method and device based on deep learning
CN115296933B (en) Industrial production data risk level assessment method and system
CN108306997B (en) Domain name resolution monitoring method and device
CN113469247B (en) Network asset abnormity detection method
CN113850528A (en) Solid waste generation amount evaluation and analysis method and system based on multi-dimensional features
CN114138601A (en) Service alarm method, device, equipment and storage medium
CN117273670B (en) Engineering data management system with learning function
CN111934903B (en) Docker container fault intelligent prediction method based on time sequence evolution gene
CN116796894A (en) Construction method of efficient deep learning weather prediction model
CN115169650B (en) Equipment health prediction method for big data analysis
CN116126807A (en) Log analysis method and related device
CN116955059A (en) Root cause positioning method, root cause positioning device, computing equipment and computer storage medium
CN115330103A (en) Intelligent analysis method and device for urban operation state, computer equipment and storage medium
CN112907111A (en) Intelligent monitoring data acquisition and analysis method based on Internet of things technology
CN111612302A (en) Group-level data management method and equipment
CN118114185B (en) Water engineering safety monitoring data processing method, system, equipment and medium
CN118070135B (en) Power consumption behavior data identification method and device, electronic equipment and storage medium
CN113723811B (en) Equipment maintenance unit assessment method and device based on machine learning and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant