CN115794803A

CN115794803A - Engineering audit problem monitoring method and system based on big data AI technology

Info

Publication number: CN115794803A
Application number: CN202310045164.5A
Authority: CN
Inventors: 王开志; 王开向; 李勇; 王涌; 龚峰平; 邱志敏
Original assignee: Beijing Zhizhen Cloud Intelligent Technology Co ltd
Current assignee: Beijing Zhizhen Cloud Intelligent Technology Co ltd
Priority date: 2023-01-30
Filing date: 2023-01-30
Publication date: 2023-03-14
Anticipated expiration: 2043-01-30
Also published as: CN115794803B

Abstract

The invention provides a project audit problem monitoring method and system based on big data AI technology, comprising the following steps: according to the collected audit information, determining audit requirements; inputting the audit requirement into a pre-constructed audit correlation model to obtain audit target data; and inputting the audit target data into a pre-constructed audit autonomous supervision model to obtain an audit monitoring result. The invention can broaden the auditing range and promote the comprehensive development of auditing supervision by applying the data identification and data mining technology in the auditing project, and the data mining technology realizes the monitoring of auditing problems by describing, deducing and mining deep and hidden relations in an audited unit database, exploring and extracting unknown data operation modes.

Description

Engineering audit problem monitoring method and system based on big data AI technology

Technical Field

The invention relates to the technical field of audit problem monitoring, in particular to a method and a system for monitoring engineering audit problems based on a big data AI technology.

Background

At present, due to the fact that engineering project construction process data are various in types and large in quantity, but the electronic file management system of China is incomplete, the electronic file management in the engineering construction field is disordered and incomplete, even a large number of traditional paper documents cannot be converted into electronic file management and cannot meet the current management requirements, and therefore real-time supervision on the engineering construction overall process compliance is lack of effective grippers. With the continuous promotion of the informatization degree, although some enterprises manage the cost, finance, contracts and the like through an intelligent management system, the whole data integration is poor, the information is cracked, the data quality is not high, and the effective data is difficult to extract and utilize. The data interface of project participants such as design, construction, owners and suppliers can not be unified, which results in lack of associable main data, and no structured data is available for enterprises, which results in low supervision quality and efficiency of the whole process of engineering construction and higher management risk.

Disclosure of Invention

The invention provides a project audit problem monitoring method and system based on big data AI technology, comprising the following steps:

according to the collected audit information, determining audit requirements;

inputting the audit requirement into a pre-constructed audit correlation model to obtain audit target data;

inputting the audit target data into a pre-constructed audit autonomous supervision model to obtain an audit monitoring result;

the construction of the audit correlation model comprises the following steps:

performing feature extraction according to the audit requirement, and determining features to be extracted;

performing feature selection according to the audit requirement, and determining a category to be extracted;

based on a preset audit database, extracting data according to the features to be extracted and the categories to be extracted, and performing data cleaning and data deduplication on the extracted data to obtain associated data;

and performing association calculation on the association data and the audit requirement, and constructing an audit association model by taking the association data with the screening association calculation result larger than a preset threshold value as audit target data.

Preferably, the construction of the audit autonomous supervision model comprises:

performing multi-mode self-supervision learning aiming at the audit target data, and labeling problem data;

and performing text classification according to a preset problem attribute value aiming at the problem data to obtain an audit monitoring result.

Preferably, the performing multi-modal self-supervised learning on the audit target data and labeling problem data includes:

coding mask data of the audit target data to obtain primary coded data;

based on the audit requirement, carrying out weight setting on the audit target data to obtain weight setting information;

coding is carried out on non-mask data in the audit target data to obtain secondary coded data;

and determining problem data based on the primary coded data, the secondary coded data and the weight setting information, and performing problem marking on the problem data in the audit target data.

Preferably, the determining the audit demand according to the collected audit information includes:

collecting audit information, and performing structured conversion on the audit information to obtain first-level conversion information;

performing multi-scale feature fusion on the primary conversion information to obtain fusion information;

extracting key information according to the fusion information, and determining that the audit information corresponds to audit requirements;

wherein the audit information includes: auditing target, auditing range and auditing content.

Preferably, the feature to be extracted includes: text features and symbolic features; the category to be extracted comprises: text part of speech and text attributes.

Based on the same invention concept, the invention also provides a project audit problem monitoring system based on big data AI technology, comprising:

a demand determination module: the audit request is determined according to the collected audit information;

a target data determination module: the audit association model is used for inputting the audit requirements into a pre-constructed audit association model to obtain audit target data;

an audit monitoring module: the audit target data are input into a pre-constructed audit autonomous supervision model to obtain an audit monitoring result;

the construction of the audit correlation model in the target data determination module comprises the following steps:

and performing association calculation on the associated data and the audit requirement, and screening the associated data of which the association calculation result is greater than a preset threshold value as audit target data to construct an audit association model.

Preferably, the construction of the audit autonomous supervision model in the audit monitoring module comprises:

Preferably, the performing multi-modal self-supervision learning on the audit target data in the audit monitoring module, and labeling problem data includes:

coding mask data of the audit target data to obtain primary coded data;

Preferably, the requirement determining module is specifically configured to:

extracting key information according to the fusion information, and determining that the audit information corresponds to an audit requirement;

Preferably, the features to be extracted by the target data determination module include: text features and symbolic features; the category to be extracted comprises: text part of speech and text attributes.

Compared with the closest prior art, the invention has the following beneficial effects:

the invention provides a project audit problem monitoring method and system based on big data AI technology, comprising the following steps: according to the collected audit information, determining audit requirements; inputting the audit requirement into a pre-constructed audit association model to obtain audit target data; inputting the audit target data into a pre-constructed audit autonomous supervision model to obtain an audit monitoring result; the construction of the audit correlation model comprises the following steps: performing feature extraction according to the audit requirement, and determining features to be extracted; performing feature selection according to the audit requirement, and determining a category to be extracted; based on a preset audit database, extracting data according to the features to be extracted and the categories to be extracted, and performing data cleaning and data deduplication on the extracted data to obtain associated data; and performing association calculation on the associated data and the audit requirement, and screening the associated data of which the association calculation result is greater than a preset threshold value as audit target data to construct an audit association model. The invention promotes the innovation of the electric power auditing operation means through data identification and data mining, has wide coverage service range and large data volume; through the application of a data analysis technology in an audit project, a power audit means is changed from a traditional knife cultivation fire method to an information technology high-tech means; the auditing range is widened, the auditing supervision is promoted to be comprehensively developed, and the data mining technology is used for mining and extracting unknown data operation modes and generating auditing doubtful points by describing, deducing and mining deep and hidden relations in an audited unit database. The big data monitoring technology can be used for finding out the originally unexpected audit problems even the audit problems which are not thought at all, not only can the missing problems of the audited units be found out, but also the artificial problems can be found out, the audit theory and experience can be greatly enriched, the audit range is enlarged, the audit depth is increased, the audit quality is improved, the audit risk is reduced, and the comprehensive audit becomes possible; and unknown business rules can be found out, the unknown business rules are found out in the massive power data, audit clues hidden in the massive data are found out through the business rules, and data support services are provided for auditors to develop audit projects.

Additional features of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic flow chart of a method for monitoring engineering audit problems based on big data AI technology in an embodiment of the present invention;

FIG. 2 is a schematic view of an audit flow in an embodiment of the present invention;

FIG. 3 is a schematic flow diagram of a data mining modeling technique employed in an exemplary embodiment of the invention;

FIG. 4 is a schematic diagram of a decision tree model obtained by using a data mining modeling technique according to an embodiment of the present invention;

fig. 5 is a schematic structural composition diagram of an engineering audit problem monitoring system based on a big data AI technology in an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions, and "plurality" means two or more unless specifically limited otherwise. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Example 1

The embodiment of the invention provides a project audit problem monitoring method based on big data AI technology, as shown in the attached figure 1, comprising the following steps:

according to the collected audit information, determining audit requirements;

the construction of the audit correlation model comprises the following steps:

The construction of the audit autonomous supervision model comprises the following steps:

The method for conducting multi-mode self-supervision learning and marking problem data aiming at the audit target data comprises the following steps:

coding mask data of the audit target data to obtain primary coded data;

and problem marking is carried out on problem data in the audit target data based on the primary encoding data, the secondary encoding data and the weight setting information.

Preferably, the determining an audit requirement according to the collected audit information includes:

The feature to be extracted comprises: text features and symbolic features; the category to be extracted comprises: text part of speech and text attributes.

A specific embodiment of the method for monitoring engineering audit problems based on big data AI technology according to the present invention is illustrated in fig. 2, and includes the following steps:

step 101: the method comprises the steps of collecting and storing audit data, deeply knowing basic conditions of an audited unit, and determining specific audit requirements by analyzing audit targets, audit ranges and audit contents. According to the clear auditing requirement, collecting a large amount of unstructured data related to an audited unit is a key step for auditing data analysis. Firstly, data from the outside of an enterprise can be acquired by adopting a network crawler technology, an OCR (optical character recognition), an API (application program interface) and other modes; on the other hand, unstructured data may be collected through various effective data access interfaces. In order to ensure the integrity and authenticity of audit data, a strict and standard system needs to be established, and scientific and safe management is carried out on collected unstructured data;

step 102: mining the audit data text, wherein the text mining comprises three parts, namely mining the core content of audit data analysis through feature extraction and feature selection, extracting words or phrases which can represent the feature information of the audit text and can distinguish and identify the category of the audit text through a vector model in the feature extraction, establishing a text representation model, and converting the unstructured audit text into an expression form which can be processed by a computer; the feature selection preferably adopts the feature with strong expression capability on the audit text content. The auditor further selects the text features extracted by the features for the auditing purpose by a feature selection method, thereby avoiding junk data and ensuring the validity of the data. And secondly, mining audit analysis data through text abstract, association analysis and text classification, and acquiring the abstract of the audit text through the text abstract in a mode of obtaining a central document. Auditors improve audit data analysis efficiency through a convenient browsing mode and a quick audit trail query method, and an association analysis technology mainly aims at associating knowledge in text mining. And performing association analysis aiming at the audit unstructured text, researching a certain implicit relation possibly existing between audit documents, acquiring essential relation between audit matters and audit targets, and classifying the text, wherein the text belongs to supervised learning. The text clustering has the advantages that supervised learning is not needed, simulation is not needed through training, and some information which is difficult to find by a conventional audit data analysis method and is hidden in big data in a special form can be made up. After clustering is carried out on the audit texts, each type of text can be specifically analyzed, compared and summarized according to categories, the reasons of abnormal data are analyzed, and the workload of audit data analysis of auditors is greatly reduced. And thirdly, expressing complex audit data in a way that can be understood by a user. The knowledge acquired by text mining can be clearly displayed through visual symbols (network diagrams, dendriform diagrams, venn diagrams, coordinates and the like), and auditors can analyze, explain and evaluate visual results according to audit targets and evaluation indexes, and carry out deeper observation and more comprehensive multidimensional analysis on audit data from different angles;

step 103: summarizing and releasing, wherein auditors and technicians jointly screen, classify, sort and deeply analyze results displayed by the visual analysis of the last stage, summarize valuable and effective audit knowledge to standardize, and form audit experience and audit clues for the auditors to analyze and obtain evidence.

As shown in figure 3, the invention also adopts a data mining modeling technology to monitor the project audit problem, the electricity consumption property authentication is a classification problem essentially, and a classification algorithm in data mining can be used for pattern recognition. Because the decision tree model has the advantage of model result interpretability, a decision tree algorithm is selected for data mining. The modeling indexes mainly considered in the modeling process are the operation capacity, the time, the daily power utilization proportion, the hourly power utilization proportion and the power utilization property of a user, the modeling data are screened more accurate data of the power utilization property, and finally a decision tree algorithm is used for building a power utilization property identification model. And then forecasting the uncertain users by using a mining model, and identifying the actual electricity utilization property of the users. Those models are predicted to be industrial or commercial, and users who are actually marked as residents serve as audit trail users.

The method specifically comprises the following steps:

step 201: the algorithm idea and decision tree learning are example-based inductive learning algorithms, which focus on reasoning classification rules in decision tree representation form from a group of unordered and irregular cases, are usually used to form classifiers and prediction models, and can classify or predict unknown data, mine data, and the like. The method comprises two steps, wherein in the first step, a decision tree is built and refined by utilizing a training sample set, and a decision tree model is built. This process is actually a process of acquiring knowledge from data and performing machine learning. And classifying the new data by the established decision tree. Among the algorithms for various decision tree classification, the most influential one is the ID3 algorithm proposed by Quinlan, and on the basis of the ID3 algorithm, the C4.5 algorithm is proposed.

The model adopts the classic C4.5 algorithm in the construction of decision trees, which was proposed by Quinlan.J.R. in 1993 and evolved from the algorithm, and the ID3 (Iterative Dichotomizer 3) algorithm was proposed by Quinlan in 1986. It is a representative of decision tree algorithms, and most decision tree algorithms are realized by improving on the basis of the decision tree algorithms. The method adopts a divide-and-conquer strategy, and uses information gain as a selection standard of attributes when selecting the attributes on each level of nodes of a decision tree, so that the maximum category information about the tested record can be obtained when testing is carried out on each non-leaf node. The method includes detecting all attributes, selecting the attribute with the largest information gain to generate decision tree nodes, establishing branches according to different values of the attribute, and recursively calling the method for subsets of the branches to establish the branches of the decision tree nodes until all the subsets only contain data of the same category. Finally, a decision tree is obtained, which can classify the new samples.

The C4.5 algorithm inherits the advantages of the ID3 algorithm and introduces new methods and functions:

(1) the defect of bias multi-value attribute when the attribute is selected by using the information gain is overcome by using the concept of the information gain rate;

(2) pruning is carried out in the tree construction process to avoid over-fitting of the tree;

(3) discretization processing of continuous attributes can be carried out;

(4) a set of training samples with missing attribute values may be processed;

(5) incomplete data can be processed;

(6) k, cross validation;

(7) the rule produces the formula.

The C4.5 algorithm reduces the computational complexity and enhances the computational efficiency. Its important improvement to the ID3 algorithm is the use of the information gain rate to select attributes. Theories and experiments show that the information gain rate is better than the information gain rate, and the attribute that the selection of the ID3 method is biased to take more values is mainly overcome. The C4.5 algorithm also processes data of continuous value attributes, and makes up the defect that the ID3 algorithm can only process discrete value attribute data.

Step 202: and (5) constructing a model. The method is used for taking training data as summary of the electricity consumption of a certain city in january and describing the characteristics of the data by five attributes, wherein the characteristics are respectively the running capacity, the electricity consumption of 1 month per day, the proportion of the electricity consumption of each day to the electricity consumption of the whole month, the electricity consumption of the whole point and the proportion of the electricity consumption of the whole point to the electricity consumption of the whole month. A C4.5 algorithm is used for constructing a decision tree, a decision tree model shown in figure 4 is obtained according to the statistics of monthly electricity utilization data and electricity utilization properties,

from the root node of the tree to each leaf node, a classification rule is constructed, and the classification rules represented by the above graph are shown in table 1:

TABLE 1 decision Tree model rule Table

Numbering	Rule conditions	Class of electricity consumption	Reliability of rule (%)
				1	24 point electric quantity ratio<Number 0.029565529941723567 and 27 electric quantity Ratio of the ingredients>0.0287835562068748 and 30 electric quantity is in proportion> 0.013059723327690499	Commerce	96.42857
2	7 point electric quantity ratio>0.014347999362346564 and 18-point charge Ratio of<Number 0.05558296635932825 and 30 electric quantity ratio<= 0.013059723327690499	Industrial process	91.04478
				3	24 point electric quantity ratio>No. 0.0295655299723567 and No. 1 electric charge account for Ratio of>0.03092121905804584 and 30 electric quantity ratio> 0.013059723327690499	Residents	87.7193
4	24 point electric quantity ratio<Number 0.029565529941723567 and 27 electric quantity Ratio of occupation of<Number 0.0287835562068748 and 30 electric quantity is in proportion> 0.013059723327690499	General industry	83.33333
				5	7 point electric quantity ratio<=0.014347999362346564 and 18 points of electric quantity account for Ratio of<Number 0.05558296635932825 and 30 electric quantity ratio<= 0.013059723327690499	General industry of the invention	75

Step 203: analyzing a model result, testing power consumption data of a certain city in 1 month on a decision tree classification model to detect the effectiveness of the model on a training data set, and selecting overall evaluation indexes: correct classification rate, kappa statistic; the target evaluation indexes are: accuracy, coverage, F measure. The overall accuracy of the model reaches 86.3%, and the accuracy of each target is higher.

Example 2

An embodiment of the present invention provides an engineering audit problem monitoring system based on big data AI technology, as shown in fig. 5, including:

and an audit monitoring module: the audit target data are input into a pre-constructed audit autonomous supervision model to obtain an audit monitoring result;

The construction of the audit autonomous supervision model in the audit monitoring module comprises the following steps:

The audit monitoring module carries out multi-mode self-supervision learning aiming at the audit target data and labels problem data, and the method comprises the following steps:

coding mask data of the audit target data to obtain primary coded data;

coding is carried out on the non-mask data in the audit target data to obtain secondary coded data;

and problem marking is carried out on problem data in the audit target data based on the primary coded data, the secondary coded data and the weight setting information.

The demand determination module is specifically configured to:

wherein the audit information includes: audit target, audit range and audit content.

The features to be extracted in the target data determination module comprise: text features and symbolic features; the category to be extracted comprises: text part of speech and text attribute.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A project audit problem monitoring method based on big data AI technology is characterized by comprising the following steps:

according to the collected audit information, determining an audit requirement;

the construction process of the audit correlation model comprises the following steps:

2. The method of claim 1, wherein the building of the audit autonomous surveillance model comprises:

3. The method of claim 2, wherein conducting multi-modal, self-supervised learning for the audit target data, tagging issue data, comprises:

coding mask data of the audit target data to obtain primary coded data;

4. The method of claim 1, wherein determining audit requirements based on the collected audit information comprises:

collecting audit information, and performing structured conversion on the audit information to obtain primary conversion information;

5. The method of claim 1, wherein the features to be extracted comprise: text features and symbolic features; the category to be extracted comprises: text part of speech and text attribute.

6. The utility model provides an engineering audit problem monitoring system based on big data AI technique which characterized in that includes:

7. The system of claim 6, wherein the construction of the audit autonomous supervision model in the audit monitor module comprises:

8. The system of claim 7, wherein the audit monitor module performs multi-modal self-supervised learning for the audit target data, labeling issue data, comprising:

coding mask data of the audit target data to obtain primary coded data;

9. The system of claim 6, wherein the requirement determining module is specifically configured to:

10. The system of claim 6, wherein the features to be extracted in the target data determination module comprise: text features and symbolic features; the category to be extracted comprises: text part of speech and text attributes.