CN113673828B - Audit data processing method, system, medium and device based on knowledge graph and big data - Google Patents

Audit data processing method, system, medium and device based on knowledge graph and big data Download PDF

Info

Publication number
CN113673828B
CN113673828B CN202110836103.1A CN202110836103A CN113673828B CN 113673828 B CN113673828 B CN 113673828B CN 202110836103 A CN202110836103 A CN 202110836103A CN 113673828 B CN113673828 B CN 113673828B
Authority
CN
China
Prior art keywords
data
project
audit
early warning
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110836103.1A
Other languages
Chinese (zh)
Other versions
CN113673828A (en
Inventor
张莉
王磊
王宁宁
李卓松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN202110836103.1A priority Critical patent/CN113673828B/en
Publication of CN113673828A publication Critical patent/CN113673828A/en
Application granted granted Critical
Publication of CN113673828B publication Critical patent/CN113673828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Educational Administration (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Fuzzy Systems (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an audit data processing method, system, medium and device based on a knowledge graph and big data, which comprises the steps of obtaining various basic information and corresponding data related to project audit and sample data of the same kind of projects in a big database; determining an audit data risk coefficient of the project according to the acquired data, wherein the audit data risk coefficient of the project is used for representing the degree that actual data related to audit in the project objectively deviates from the normal condition of similar projects; determining whether early warning is needed or not according to the audit data risk coefficient of the project; the invention establishes a proper audit data risk model and algorithm by applying a big data technology, can quickly and effectively identify various basic information related to project audit and hidden data risks in corresponding data through objective comparison with sample data of the same kind of projects in a database, and displays results and sends out early warning through visualization technologies such as a knowledge map and the like.

Description

Audit data processing method, system, medium and device based on knowledge graph and big data
Technical Field
The invention belongs to the technical field of big data mining, and particularly relates to an audit data processing method, system, medium and device based on a knowledge graph and big data.
Background
With the advent of the internet information era, the big data mining technology is widely applied, and the development of data processing methods in various fields is continuously promoted, but in the auditing field, the existing auditing means still mainly adopt the traditional manual auditing method, for example, the analysis result is still shown in the form of a listed problem by the traditional random sampling analysis, so that the intuitive feeling cannot be provided for a report user, and the processing of the basic data by the existing auditing method still stays in the preprocessing stage of the standardization and standardization of the data, the big data mining is not performed by using huge databases of different project categories, the system comprehensively processes and analyzes the data of the auditing project, and the objective quantification, visual display and risk early warning of the risk of the auditing data are not performed. The big data and knowledge graph technology is utilized, the weak audit data processing technology is strengthened, and the audit efficiency and the audit quality are urgently improved through advanced technical means.
Disclosure of Invention
In view of the above problems, the present application provides a method, a system, a medium, and an apparatus for processing audit data based on a knowledge graph and big data to solve the above technical problems. Specifically, the invention provides the following technical scheme:
in a first aspect, the present invention provides an audit data processing method, including:
acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in a large database;
determining an audit data risk coefficient of the project according to the various basic information and corresponding data related to project audit and sample data of the same kind of project in the big database, wherein the audit data risk coefficient of the project is used for representing the degree that actual data related to audit in the project objectively deviates from the normal condition of the same kind of project;
determining whether early warning is needed or not according to the audit data risk coefficient of the project;
and the risk coefficient of the audit data of the project needing early warning is larger than that of the project needing no early warning.
In a second aspect, the present invention provides an audit data processing system, the system including:
the data acquisition module is used for acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in the large database;
the big data storage module is used for storing various data related to auditing and storing and calling the data by each module;
the data processing module is used for determining an audit data risk coefficient of the project according to the various basic information and corresponding data related to the project audit and sample data of the same kind of project in the big database, wherein the audit data risk coefficient of the project is used for representing the degree of objectively deviating from the normal condition of the same kind of project of actual data related to the audit in the project; determining whether early warning is needed or not according to the audit data risk coefficient of the project;
and the result visual output module is used for sending out early warning information and displaying various visual display data.
And the risk coefficient of the audit data of the project needing early warning is larger than that of the project needing no early warning.
In a third aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the audit data processing method of the first aspect.
In a fourth aspect, the invention provides a computer apparatus comprising a memory and a processor; the memory for storing a computer program; the processor is configured to implement the audit data processing method according to the first aspect when executing the computer program.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention establishes a proper audit data risk model and algorithm by applying a big data technology, can quickly and effectively identify various basic information related to project audit and data risks hidden in corresponding data by objectively comparing the audit data risk model with sample data of the same kind of projects in a database, and displays results and sends out early warning through visualization technologies such as a knowledge map and the like;
(2) The data acquisition method comprises the steps of acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in a large database; the data are real-time and dynamically updated, so that the real-time performance and the accuracy of project audit data processing can be ensured, sample data in a large database is continuously updated and expanded, and the accuracy and the matching degree of large data mining are continuously improved in the using process, so that the method and the system can play the greatest role, realize on-line dynamic audit data processing, and have obvious effects;
(3) The method comprises the steps of firstly mining the data of sample data in a sub-database consisting of similar projects with the same or similar categories as the project to be audited in the existing big data, then comparing the sample data with the actual data of the project data to be audited, and then carrying out data processing analysis to obtain the audit data risk coefficient of the whole project and the sub-project data risk coefficient of each expenditure category in all expenditure categories in the project;
(4) The method provides risk early warning according to the magnitude of the risk coefficient of the audit data of the project, judges whether the risk coefficient of the audit data exceeds a preset threshold value, sends out early warning information if the risk coefficient of the audit data exceeds the preset threshold value, reminds an information receiver that the project has audit risk, and visually displays the risk coefficient of the audit data of the project, the risk coefficient of the itemized data of each expenditure category in all expenditure categories and the similar project data in the adopted big database;
(5) The invention adopts a local computer processing system or a distributed processor system in a cloud computing platform, and can also combine the local computer processing system and the distributed processor system to dynamically adjust and balance the loads of different resources in the whole system range, thereby well solving the problems of reasonable use and effective management of a large-scale system;
(6) The invention takes the same project data in the big data which is the same as or similar to the project to be audited as the sample data, and simultaneously, the sample data forms a sub-database which can be updated and expanded in real time, so the basic data which is mined for the big data has high similarity and consistency, and the accuracy and the credibility of the audit data processing are improved on the basis;
(7) Before the audit data risk coefficient of the project is determined, the project data in the big database are compared according to the expenditure category information of the project, the similar projects are matched, all the similar projects which are confirmed to be matched form a sub-database in the big database to be subjected to unified data processing, the method is scientific and reasonable, the similar projects of the big data are classified before the data processing, the sub-database is formed and matched with the project to be evaluated, and the data is filtered and classified before the calculation processing, so that the data processing flow is simplified, and the calculation load of the data processing is greatly reduced; the system has high running speed and high accuracy, and reduces unnecessary interference;
(8) The data acquired by the invention, the big data mining result and other related data and the audit case can be provided for other related systems to use, and the data can be shared and can also be cooperatively worked.
Drawings
For ease of illustration, the invention is described in detail by the following detailed description and the accompanying drawings.
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic flow chart of another method of the present invention;
FIG. 3 is a schematic diagram of the system of the present invention;
FIG. 4 is a schematic diagram of another system configuration of the present invention;
FIG. 5 is a schematic diagram of a computer-readable storage medium of the present invention;
FIG. 6 is a schematic diagram of a computer device according to the present invention
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Example 1
As shown in fig. 1-2, the present invention provides an audit data processing method, which in one embodiment may preferably be performed by:
acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in a large database;
according to the various basic information and corresponding data related to the project audit and sample data of the same type of projects in the big database, determining an audit data risk coefficient of the project, wherein the audit data risk coefficient of the project is used for representing the degree that actual data related to the audit in the project objectively deviates from the normal condition of the same type of projects;
determining whether early warning is needed or not according to the audit data risk coefficient of the project; the determination of the risk coefficient is the basis for finally deciding whether to carry out early warning or prompting.
In more particular embodiments, whether or not to forewarn may be determined by, for example, setting a threshold value for an audit data risk factor, and comparing it to the threshold value. In addition, historical data and early warning conditions in the past audit projects of the same type can be calculated, a reasonable median value of an early warning point and a corresponding early warning coefficient is determined through a BP model and the like, the median value is used as a basis, a reference standard of a risk coefficient is determined, and therefore whether early warning is carried out or not is determined.
In a more preferred embodiment, if the early warning is performed, the threshold of the risk coefficient related to the early warning can be set to be multiple based on the historical data, so that the early warning levels can be reasonably divided according to different early warning points, and therefore users of the system can be enabled to know the emergency degree or severity degree of the early warning more, and the risk processing can be performed by matching with subsequent people and the like.
And the risk coefficient of the audit data of the project needing early warning is larger than that of the project needing no early warning. Here, it is implemented as a preferred embodiment of the present invention that the higher the audit data risk coefficient is, the higher the risk is, the higher the possibility of triggering the pre-warning corresponding to the data or project information is, or the higher the triggered pre-warning level is.
Wherein, various basic information and corresponding data related to project audit comprise static data and dynamic data, and the related data is dynamically updated; after confirming that the sample data of the same type items in the large database belong to normal conditions after the project audit is finished, classifying the sample data into the sub-databases of the same type items in the large database for subsequent audit data processing work to mine the large data; and realizing on-line dynamic audit data processing.
Further, after various basic information and corresponding data related to project audit and sample data of the same type projects in the large database are obtained, project data in the large database are compared according to expenditure category information of the projects, the similar projects are matched, and all the similar projects which are confirmed to be matched form a sub database in the large database to be subjected to unified data processing.
As a better mode, during matching, a filtering method (Filter), a packing method (Wrapper) or an embedding method (Embedded) and the like are adopted for feature comparison and screening, the group features and the individual weight features of the audit items are further mined, and the matching degree is improved.
Further, the various basic information and corresponding data related to project auditing comprise: project expense categories relating to project audits, total budget data for all expense categories, budget data for each expense category, and actual expense data corresponding thereto.
Further, sample data of the same kind of items in the large database includes: the project expenditure categories of the same type project samples in the big database, the total expenditure data of all the expenditure categories and the actual data corresponding to various types of expenditure.
As a more preferred embodiment, determining whether to need early warning according to the audit data risk coefficient of the project refers to judging according to whether the audit data risk coefficient of the project exceeds a preset threshold, if so, sending early warning information to remind a receiver that the project has an audit risk, and visually displaying the audit data risk coefficient of the project, the branch data risk coefficients of all the expenditure categories and the similar project data in the adopted large database, for example, visually displaying by using a knowledge graph visualization technology; the item data risk coefficient of each expenditure category in all expenditure categories is the item data risk coefficient of the ith expenditure in the project; the similar project data is used as a sub-database in the big database for data processing.
As a preferred implementation manner, in the process of data acquisition, database screening and matching, data processing, result output, and the like, the NoSQL database is used for acquiring, storing, and calling basic data. NoSQL generally refers to a non-relational database, and the generation of the NoSQL database is to solve the challenges brought by large-scale data set multiple data types, especially the problem of big data application, and can establish a fast and extensible storage library for big data. In a traditional relational database, logic database setting needs to be carried out firstly, character length and type setting needs to be carried out on each storage variable, and the data mode of the relational database is static. In a big data environment, the data mode is dynamically changed, and the traditional database technology cannot solve the problem. Meanwhile, for the expansion of data types, data types such as documents, reports, pictures, audio, video, etc. cannot be stored in the relational database, and these data types will become data information required by this embodiment, so the NoSQL database is required for data acquisition.
As a preferred implementation, the embodiment adopts a third-party big data cluster or an open-source hadoop big data cluster.
Further, the method for determining the audit data risk coefficient of the project specifically comprises the following steps:
Figure BDA0003177331540000041
wherein P is an audit data risk factor for the project;
i is the item class i payout;
m is the total number of all payout categories in the project;
w i0 budget data for class i spending in the project;
w is the total budget data for all expense categories in the project;
j is the jth like item sample in the big database to which the item belongs;
n is the total number of similar project samples in the large database to which the project belongs;
q ij is the data of the ith class expenditure of the jth similar project sample in the big database to which the project belongs;
Q j is the total expenditure data of all expenditure categories of the jth similar project sample in the big database to which the project belongs;
w i actual payout data for class i payouts in the project;
k is an adjustment coefficient;
in addition, the itemized data risk coefficients for class i payouts in the project are as follows:
Figure BDA0003177331540000051
wherein, P i Is the sub-item data risk coefficient of the ith class expenditure in the project;
note that: the data in the audit data risk coefficient determination method are all actual data in the project or are all preprocessed data; data preprocessing, data cleaning, data conversion, data integration and data loading are all existing data processing means, and detailed description thereof is omitted here.
As a better mode, in the whole implementation process, a local computer processing system or a distributed processor system in a cloud computing platform is adopted, the basic data acquired by the front end is imported into the distributed processor system, such as a distributed database or a distributed storage cluster, and cleaning or preprocessing is performed on the basis of the import, so that the processing requirement of mass data can be met, and the import amount per second can often reach hundreds of megabytes or even the order of megabytes. As a more preferable mode, the distributed processor system integrates dynamic load balancing and group management and deployment mechanisms, and the platform can monitor the running state of each node of the whole system in real time and dynamically adjust and balance the load of different resources in the whole system range, thereby well solving the problems of reasonable use and effective management of a large-scale system.
Example 2
As shown in fig. 3-4, the present invention provides an audit data processing system comprising:
the data acquisition module is used for acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in the large database;
wherein, various basic information and corresponding data related to project audit comprise static data and dynamic data, and the related data is dynamically updated; after confirming that the sample data of the same type items in the large database belong to normal conditions after the project audit is finished, classifying the sample data into the sub-databases of the same type items in the large database for subsequent audit data processing work to mine the large data; and realizing on-line dynamic audit data processing.
The big data storage module is used for storing various data related to auditing and storing and calling the data by each module;
the stored data comprises project data to be audited, sample data of various projects, a sub database in the large database formed by confirming all similar projects after matching, preprocessed data, intermediate processed data, result data, visual display data and the like;
the data processing module is used for determining an audit data risk coefficient of the project according to the various basic information and corresponding data related to the project audit and sample data of the same kind of project in the big database, wherein the audit data risk coefficient of the project is used for representing the degree of objectively deviating from the normal condition of the same kind of project of actual data related to the audit in the project; determining whether early warning is needed or not according to the audit data risk coefficient of the project;
and the result visual output module is used for sending out early warning information and displaying various visual display data.
As a more preferable embodiment, the result visualization output module can send out early warning information to remind an information receiver that the item has an audit risk, and display the audit data risk coefficient of the item, the item data risk coefficient of each expenditure category in all expenditure categories, and the like information in the adopted large database by using visualization technologies such as a knowledge graph and the like;
and the risk coefficient of the audit data of the project needing early warning is larger than that of the project needing no early warning.
As a more preferable embodiment, when different levels are set for the risk coefficient forewarning, a plurality of threshold values comparable to the risk coefficient can be set at the time to facilitate reasonable grading of the risk forewarning, and when the system visually performs the risk audit forewarning, the risk can be distinguished by different colors, the frequency of prompt blinking, and the like.
In a more preferred embodiment, when setting a plurality of comparison threshold values of risk coefficients, the threshold values may be obtained based on historical data characteristics of risk points in different past classification items, for example, by using a reasonable historical data median, or by using an AI algorithm to perform reasonable classification threshold setting, and the like.
Further, the system further comprises:
and the database sample matching module is used for matching similar items after comparing the item data in the large database according to the expense category information of the items, and forming all the similar items which are confirmed to be matched into a sub-database in the large database to perform unified data processing.
As a preferred implementation manner, in the process of data acquisition, database screening and matching, data processing, result output, and the like, the NoSQL database is used for acquiring, storing, and calling basic data. NoSQL generally refers to a non-relational database, and the generation of the NoSQL database is to solve the challenges brought by large-scale data set multiple data types, especially the problem of big data application, and can establish a fast and extensible storage library for big data. In a traditional relational database, logical database setting needs to be carried out first, character length and type setting are carried out on each storage variable, and the data mode of the traditional relational database is static. In a big data environment, the data mode is dynamically changed, and the traditional database technology cannot solve the problem. Meanwhile, for the expansion of data types, data types such as documents, reports, pictures, audio, video and the like cannot be stored in the relational database, and these data types will become data information required by the embodiment, so that the NoSQL database is required for data acquisition.
As a preferred implementation, the present embodiment employs a third party big data cluster or an open source hadoop big data cluster.
Further, the method for determining the audit data risk coefficient of the project specifically comprises the following steps:
Figure BDA0003177331540000061
wherein P is an audit data risk factor for the project;
i is the item class i payout;
m is the total number of all payout categories in the project;
w i0 budget data for class i spending in the project;
w is the total budget data for all expense categories in the project;
j is the jth like item sample in the big database to which the item belongs;
n is the total number of similar project samples in the large database to which the project belongs;
q ij is the data of the ith class expenditure of the jth similar project sample in the big database to which the project belongs;
Q j is the total expenditure data of all expenditure categories of the jth similar project sample in the big database to which the project belongs;
w i actual payout data for class i payouts in the project;
k is an adjustment coefficient;
in addition, the risk factors of the itemized data of the ith class expenditure in the project are as follows:
Figure BDA0003177331540000071
wherein, P i Is the sub-item data risk coefficient of the ith class expenditure in the project;
note that: the data in the audit data risk coefficient determination method are all actual data in the project or are all preprocessed data; data preprocessing, data cleaning, data conversion, data integration and data loading are all existing data processing means, and are not described in detail herein.
As a better mode, in the whole implementation process, a local computer processing system is adopted, or a distributed processor system in a cloud computing platform is adopted, the basic data acquired by the front end is imported into the distributed processor system, such as a distributed database or a distributed storage cluster, and cleaning or preprocessing is performed on the basis of the import, so that the processing requirement of mass data can be met, and the import amount per second can often reach the level of hundreds of megabits, even megabits. Preferably, the distributed processor system integrates a dynamic load balancing and group management and allocation mechanism, and the platform can monitor the running state of each node of the whole system in real time and dynamically adjust and balance the load of different resources in the whole system range, thereby well solving the problems of reasonable use and effective management of a large-scale system.
Example 3
As shown in fig. 5, the present invention provides a computer-readable storage medium on which a computer program is stored, wherein the program is characterized by implementing the audit data processing method as described in embodiment 1 above when executed by a processor.
Example 4
As shown in fig. 6, the present invention provides a computer device, which is characterized by comprising a memory and a processor; the memory for storing a computer program; the processor is configured to implement the audit data processing method according to embodiment 1 when executing the computer program.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention applies big data technology, establishes a proper audit data risk model and algorithm, can quickly and effectively identify various basic information related to project audit and hidden data risk in corresponding data through objective comparison with sample data of the same kind of projects in a database, and displays results and sends out early warning through visualization technologies such as a knowledge map and the like;
(2) The data acquisition method comprises the steps of acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in a large database; the data are real-time and dynamically updated, so that the real-time performance and the accuracy of project audit data processing can be ensured, sample data in a large database is continuously updated and expanded, and the accuracy and the matching degree of large data mining are continuously improved in the using process, so that the method and the system can play the greatest role, realize on-line dynamic audit data processing, and have remarkable effects;
(3) The method comprises the steps of firstly mining sample data in a sub-database consisting of similar projects with the same or similar categories as the project to be audited in the existing big data, then comparing the sample data with actual data of the project data to be audited, and then performing data processing analysis to obtain the audit data risk coefficient of the whole project and the sub-project data risk coefficient of each expenditure category in all expenditure categories in the project;
(4) The method provides risk early warning according to the magnitude of the risk coefficient of the audit data of the project, judges whether the risk coefficient of the audit data exceeds a preset threshold value, sends out early warning information if the risk coefficient of the audit data exceeds the preset threshold value, reminds an information receiver that the project has audit risk, and visually displays the risk coefficient of the audit data of the project, the risk coefficient of the itemized data of each expenditure category in all expenditure categories and the similar project data in the adopted big database;
(5) The invention adopts a local computer processing system or a distributed processor system in a cloud computing platform, and can also combine the local computer processing system and the distributed processor system, and the load of different resources in the whole system range can be dynamically adjusted and balanced, thereby well solving the problems of reasonable use and effective management of a large-scale system;
(6) The invention takes the same project data in the big data which is the same as or similar to the project to be audited as the sample data, and simultaneously, the sample data forms a sub-database which can be updated and expanded in real time, so the basic data which is mined for the big data has high similarity and consistency, and the accuracy and the credibility of the audit data processing are improved on the basis;
(7) Before the audit data risk coefficient of the project is determined, the project data in the big database are compared according to the expenditure category information of the project, the similar projects are matched, all the similar projects which are confirmed to be matched form a sub-database in the big database to be subjected to unified data processing, the method is scientific and reasonable, the similar projects of the big data are classified before the data processing, the sub-database is formed and matched with the project to be evaluated, and the data is filtered and classified before the calculation processing, so that the data processing flow is simplified, and the calculation load of the data processing is greatly reduced; the system has high running speed and high accuracy, and unnecessary interference is reduced;
(8) The data acquired by the invention, the big data mining result and other related data and the audit case can be visually displayed by using the visualization technologies such as the knowledge graph and the like, and are provided for other related systems to use, so that the data can be shared, and the cooperative work can be realized.
Finally, it should be noted that: the above description is only for the purpose of illustrating preferred embodiments of the present invention and is not to be taken in any way as limiting the scope of the present invention; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A method of auditing data processing, the method comprising:
acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in a large database;
determining an audit data risk coefficient of the project according to the various basic information and corresponding data related to project audit and sample data of the same kind of project in the big database, wherein the audit data risk coefficient of the project is used for representing the degree that actual data related to audit in the project objectively deviates from the normal condition of the same kind of project;
determining whether early warning is needed or not according to the audit data risk coefficient of the project;
the risk coefficient of the audit data of the project needing early warning is larger than that of the project needing no early warning;
the method for determining whether early warning is needed according to the audit data risk coefficient of the project further comprises the following steps:
calculating historical data and early warning conditions in the past audit projects of the same type, determining a reasonable median of an early warning point and a corresponding early warning coefficient, and determining a reference standard of a risk coefficient by taking the median as a basis so as to determine whether to perform early warning;
the threshold values of the risk coefficients related to the early warning are set to be multiple, so that the early warning levels are reasonably divided according to different early warning points, namely the higher the risk coefficient of the audit data is, the higher the risk is, the higher the possibility of triggering the early warning corresponding to the data or project information is, or the higher the level of the triggered early warning is;
the method for determining the audit data risk coefficient of the project specifically comprises the following steps:
Figure FDA0004013555990000011
wherein P is an audit data risk factor for the project;
i is the item class i payout;
m is the total number of all payout categories in the project;
w i0 budget data for class i spending in the project;
w is the total budget data for all expenditure categories in the project;
j is the jth like item sample in the big database to which the item belongs;
n is the total number of similar project samples in the large database to which the project belongs;
q ij is the data of the ith class expenditure of the jth similar project sample in the big database to which the project belongs;
Q j is the total expenditure data of all expenditure categories of the jth similar project sample in the big database to which the project belongs;
w i actual payout data for class i payouts in the project;
k is an adjustment coefficient.
2. The audit data processing method of claim 1, wherein the method further comprises: after obtaining various basic information and corresponding data related to project auditing and sample data of the same kind of project in a large database,
and matching similar items after comparing the item data in the large database according to the expenditure category information of the items, and forming all the similar items confirmed to be matched into a sub-database in the large database to perform unified data processing.
3. The audit data processing method of claim 1, wherein the various basic information and corresponding data related to project audit includes: project expense categories relating to project audits, total budget data for all expense categories, budget data for each expense category, and actual expense data corresponding thereto.
4. The audit data processing method according to claim 1, wherein the sample data of the same category items in the big database includes: the project expenditure categories of the same type project samples in the big database, the total expenditure data of all the expenditure categories and the actual data corresponding to various types of expenditure.
5. An audit data processing system, the system comprising:
the data acquisition module is used for acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in the large database;
the big data storage module is used for storing various data related to auditing and storing and calling the data by each module;
the data processing module is used for determining an audit data risk coefficient of the project according to the various basic information and corresponding data related to the project audit and sample data of the same kind of project in the big database, wherein the audit data risk coefficient of the project is used for representing the degree of objectively deviating from the normal condition of the same kind of project of actual data related to the audit in the project; determining whether early warning is needed or not according to the audit data risk coefficient of the project;
the result visual output module is used for sending out early warning information and displaying various visual display data;
the risk coefficient of audit data of the project needing early warning is larger than that of the project needing no early warning;
the method for determining whether early warning is needed according to the audit data risk coefficient of the project further comprises the following steps:
calculating historical data and early warning conditions in the past audit projects of the same type, determining a reasonable median of an early warning point and a corresponding early warning coefficient, and determining a reference standard of a risk coefficient by taking the median as a basis so as to determine whether to perform early warning;
the threshold values of the risk coefficients related to the early warning are set to be multiple, so that the early warning levels are reasonably divided according to different early warning points, namely the higher the risk coefficient of the audit data is, the higher the risk is, the higher the possibility of triggering the early warning corresponding to the data or project information is, or the higher the level of the triggered early warning is;
the method for determining the audit data risk coefficient of the project specifically comprises the following steps:
Figure FDA0004013555990000021
wherein P is an audit data risk factor for the project;
i is the item class i payout;
m is the total number of all payout categories in the project;
w i0 budget data for class i spending in the project;
w is the total budget data for all expense categories in the project;
j is the jth like item sample in the big database to which the item belongs;
n is the total number of similar project samples in the large database to which the project belongs;
q ij is the data of the ith class expenditure of the jth similar project sample in the big database to which the project belongs;
Q j is the total expenditure data of all expenditure categories of the jth similar project sample in the big database to which the project belongs;
w i actual payout data for class i payouts in the project;
k is an adjustment coefficient.
6. An audit data processing system according to claim 5 wherein the system further includes:
and the database sample matching module is used for matching similar items after comparing the item data in the large database according to the expense category information of the items, and forming all the similar items which are confirmed to be matched into a sub-database in the large database to perform unified data processing.
7. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, carries out the audit data processing method of any of claims 1 to 4.
8. A computer apparatus comprising a memory and a processor; the memory for storing a computer program; the processor, when executing the computer program, implementing an audit data processing method according to any of claims 1-4.
CN202110836103.1A 2021-07-23 2021-07-23 Audit data processing method, system, medium and device based on knowledge graph and big data Active CN113673828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110836103.1A CN113673828B (en) 2021-07-23 2021-07-23 Audit data processing method, system, medium and device based on knowledge graph and big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110836103.1A CN113673828B (en) 2021-07-23 2021-07-23 Audit data processing method, system, medium and device based on knowledge graph and big data

Publications (2)

Publication Number Publication Date
CN113673828A CN113673828A (en) 2021-11-19
CN113673828B true CN113673828B (en) 2023-04-07

Family

ID=78539948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110836103.1A Active CN113673828B (en) 2021-07-23 2021-07-23 Audit data processing method, system, medium and device based on knowledge graph and big data

Country Status (1)

Country Link
CN (1) CN113673828B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114266640A (en) * 2021-12-24 2022-04-01 南方电网数字电网研究院有限公司 Auditing method and device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI482047B (en) * 2012-11-06 2015-04-21 Inst Information Industry Information security audit method, system and computer readable storage medium for storing thereof
CN108320180A (en) * 2018-01-17 2018-07-24 四川同兴达建设咨询有限公司 Construction project cost auditing system
CN111242575A (en) * 2020-01-09 2020-06-05 广东卓维网络有限公司 Comprehensive management on-line auditing system
CN111340290A (en) * 2020-02-25 2020-06-26 广东卓维网络有限公司 Auditing system with monitoring and early warning functions
CN112100164A (en) * 2020-09-11 2020-12-18 南京审计大学 Intelligent auditing method, system and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张莉 等.大数据审计模型及案例研究——以高校巡察审计为例.《财会通讯》.2020,(第13期),第141-144页. *
邢春玉 等.内部审计:从数字化到智能化.《财会月刊》.2021,第100-105页. *

Also Published As

Publication number Publication date
CN113673828A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
WO2020253358A1 (en) Service data risk control analysis processing method, apparatus and computer device
CN105512799B (en) Power system transient stability evaluation method based on mass online historical data
CN109597936B (en) New user screening system and method
CN116955092B (en) Multimedia system monitoring method and system based on data analysis
CN108170769A (en) A kind of assembling manufacturing qualitative data processing method based on decision Tree algorithms
CN107451266A (en) For processing data method and its equipment
WO2024067387A1 (en) User portrait generation method based on characteristic variable scoring, device, vehicle, and storage medium
WO2021128523A1 (en) Technology readiness level determination method and system based on science and technology big data
CN112950359B (en) User identification method and device
CN109086299A (en) Analyze quality of data method
CN113673828B (en) Audit data processing method, system, medium and device based on knowledge graph and big data
CN109977977B (en) Method for identifying potential user and corresponding device
CN117391440A (en) Enterprise information reconnaissance platform and method
CN115619245A (en) Portrait construction and classification method and system based on data dimension reduction method
CN111612491B (en) State analysis model construction method, analysis method and device
CN113515560A (en) Vehicle fault analysis method and device, electronic equipment and storage medium
CN112434886A (en) Method for predicting client mortgage loan default probability
CN114861800B (en) Model training method, probability determining device, model training equipment, model training medium and model training product
CN116342255A (en) Internet consumption credit anti-fraud risk identification method and system
CN113326203B (en) Information recommendation method, equipment and storage medium
CN107783942B (en) Abnormal behavior detection method and device
KR102357475B1 (en) Energy Theft Detecting System And Method Using Improved GBTD Algorithm
CA3144051A1 (en) Data sorting method, device, and system
CN113377834A (en) Power data mining analysis method based on big data technology
CN111598418A (en) Balance-based item sorting method, balance-based item sorting device, balance-based item sorting equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant