CN113673828B

CN113673828B - Audit data processing method, system, medium and device based on knowledge graph and big data

Info

Publication number: CN113673828B
Application number: CN202110836103.1A
Authority: CN
Inventors: 张莉; 王磊; 王宁宁; 李卓松
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2023-04-07
Anticipated expiration: 2041-07-23
Also published as: CN113673828A

Abstract

The invention discloses an audit data processing method, system, medium and device based on a knowledge graph and big data, which comprises the steps of obtaining various basic information and corresponding data related to project audit and sample data of the same kind of projects in a big database; determining an audit data risk coefficient of the project according to the acquired data, wherein the audit data risk coefficient of the project is used for representing the degree that actual data related to audit in the project objectively deviates from the normal condition of similar projects; determining whether early warning is needed or not according to the audit data risk coefficient of the project; the invention establishes a proper audit data risk model and algorithm by applying a big data technology, can quickly and effectively identify various basic information related to project audit and hidden data risks in corresponding data through objective comparison with sample data of the same kind of projects in a database, and displays results and sends out early warning through visualization technologies such as a knowledge map and the like.

Description

A method, system, medium and device for processing audit data based on knowledge graph and big data

技术领域technical field

本发明属于大数据挖掘技术领域，特别涉及一种基于知识图谱及大数据的审计数据处理方法、系统、介质及装置。The invention belongs to the technical field of big data mining, and in particular relates to a method, system, medium and device for processing audit data based on knowledge graphs and big data.

背景技术Background technique

随着互联网信息时代的到来，大数据挖掘技术得到了广泛的应用，不断推动着数据处理方法在各个领域日新月异地发展，但是在审计领域，现有的审计手段仍以传统人工审计方法为主，比如，传统的随机抽样分析，分析结果仍以罗列问题的形式展现，无法为报告使用人提供直观的感受，并且现有审计方法对基础数据的处理仍然停留在对数据的标准化规范化的预处理阶段，并未利用不同项目类别的庞大数据库进行大数据挖掘，系统全面的处理和分析审计项目的数据，也未能对审计数据的风险大小进行客观的量化、可视化显示和风险预警。利用大数据和知识图谱技术，加强薄弱的审计数据处理技术，通过先进的技术手段提升审计效率和审计质量迫在眉睫。With the advent of the Internet information age, big data mining technology has been widely used, which has continuously promoted the rapid development of data processing methods in various fields. However, in the field of auditing, the existing auditing methods are still dominated by traditional manual auditing methods. For example, in the traditional random sampling analysis, the analysis results are still displayed in the form of a list of questions, which cannot provide intuitive feelings for report users, and the processing of basic data by existing audit methods is still in the preprocessing stage of standardization and normalization of data , did not use the huge database of different project categories for big data mining, systematically and comprehensively process and analyze the data of audit projects, and failed to objectively quantify, visually display and risk early warning the risk of audit data. It is imminent to use big data and knowledge graph technology to strengthen the weak audit data processing technology and improve audit efficiency and audit quality through advanced technical means.

发明内容Contents of the invention

鉴于以上问题，本申请提供一种基于知识图谱及大数据的审计数据处理方法、系统、介质及装置，以解决上述技术问题。具体而言，本发明提供了以下技术方案：In view of the above problems, the present application provides an audit data processing method, system, medium and device based on knowledge graph and big data to solve the above technical problems. Specifically, the present invention provides the following technical solutions:

第一方面，本发明提供了一种审计数据处理方法，所述方法包括：In a first aspect, the present invention provides a method for processing audit data, the method comprising:

获取与项目审计相关的各种基础信息和对应数据，以及大数据库中同类项目的样本数据；Obtain various basic information and corresponding data related to project audit, as well as sample data of similar projects in the large database;

根据所述与项目审计相关的各种基础信息和对应数据，以及大数据库中同类项目的样本数据，确定项目的审计数据风险系数，所述项目的审计数据风险系数用于表征项目中与审计相关的实际数据客观上偏离同类项目正常情况的程度；According to the various basic information and corresponding data related to project auditing, as well as the sample data of similar projects in the large database, the audit data risk coefficient of the project is determined, and the audit data risk coefficient of the project is used to represent the audit-related aspects of the project The actual data objectively deviates from the normal situation of similar projects;

根据所述项目的审计数据风险系数，确定是否需要预警；According to the audit data risk coefficient of the project in question, determine whether an early warning is required;

其中，需要预警的项目的审计数据风险系数大于无需预警的项目的审计数据风险系数。Among them, the audit data risk coefficient of the project that needs early warning is greater than the audit data risk coefficient of the project that does not need early warning.

第二方面，本发明提供了一种审计数据处理系统，所述系统包括：In a second aspect, the present invention provides an audit data processing system, the system comprising:

数据采集模块，用于获取与项目审计相关的各种基础信息和对应数据，以及大数据库中同类项目的样本数据；The data collection module is used to obtain various basic information and corresponding data related to project audit, as well as sample data of similar projects in the large database;

大数据存储模块，用于存储与审计相关的各种数据，供各模块存储和调用；The big data storage module is used to store various data related to auditing for storage and recall by each module;

数据处理模块，用于根据所述与项目审计相关的各种基础信息和对应数据，以及大数据库中同类项目的样本数据，确定项目的审计数据风险系数，所述项目的审计数据风险系数用于表征项目中与审计相关的实际数据客观上偏离同类项目正常情况的程度；并根据所述项目的审计数据风险系数，确定是否需要预警；The data processing module is used to determine the audit data risk coefficient of the project according to the various basic information and corresponding data related to the project audit, as well as the sample data of similar projects in the large database, and the audit data risk coefficient of the project is used for Characterize the degree to which the actual audit-related data in the project deviates objectively from the normal situation of similar projects; and determine whether an early warning is required based on the risk coefficient of the audit data of the project in question;

结果可视化输出模块，用于发出预警信息，显示各种可视化展示数据。The result visualization output module is used to issue early warning information and display various visualization display data.

第三方面，本发明提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现第一方面所述的审计数据处理方法。In a third aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the audit data processing method described in the first aspect is implemented.

第四方面，本发明提供了一种计算机装置，包括存储器和处理器；所述存储器，用于存储计算机程序；所述处理器，用于执行所述计算机程序时，实现如第一方面所述的审计数据处理方法。In a fourth aspect, the present invention provides a computer device, including a memory and a processor; the memory is used to store a computer program; and the processor is used to execute the computer program to implement the computer program described in the first aspect. Audit data processing method.

与现有技术相比，本发明有益效果如下：Compared with the prior art, the beneficial effects of the present invention are as follows:

(1)本发明应用大数据技术，建立了合适的审计数据风险模型及算法，经过与数据库中同类项目的样本数据的客观比对，能够快速有效的识别与项目审计相关的各种基础信息和对应数据中隐藏的数据风险，并通过知识图谱等可视化技术显示结果和发出预警；(1) The present invention applies big data technology to establish a suitable audit data risk model and algorithm, and through objective comparison with the sample data of similar projects in the database, it can quickly and effectively identify various basic information related to project auditing and Correspond to the data risks hidden in the data, and display the results and issue early warnings through visualization technologies such as knowledge graphs;

(2)本发明的数据获取包括与项目审计相关的各种基础信息和对应数据，以及大数据库中同类项目的样本数据；上述数据均是实时的，也是动态更新的，进而能够保证项目审计数据处理的实时性和准确性，且大数据库中的样本数据也是不断更新扩充的，在使用过程中不断提高大数据挖掘的精确度和匹配度，使本发明的方法和系统均能够发挥最大的作用，实现在线的动态审计数据处理，效果显著；(2) The data acquisition of the present invention includes various basic information and corresponding data related to project auditing, and sample data of similar projects in the large database; the above-mentioned data are all real-time and dynamically updated, and then can ensure project audit data The real-time and accuracy of processing, and the sample data in the large database are constantly updated and expanded, and the accuracy and matching degree of big data mining are continuously improved during use, so that the method and system of the present invention can play the greatest role , realize online dynamic audit data processing, the effect is remarkable;

(3)本发明的审计数据风险系数的确定，是将现有大数据中与待审计项目类别相同或相似的同类项目组成的子数据库中的样本数据，先经过数据挖掘后，再与待审计项目数据的实际数据进行比对，然后经过数据处理分析，得出整个项目的审计数据风险系数，及项目中所有支出类别中各支出类别的分项数据风险系数；(3) The determination of the audit data risk coefficient of the present invention is that the sample data in the sub-database formed by the same or similar similar items in the existing big data with the category of the item to be audited is first through data mining, and then compared with the sample data to be audited The actual data of the project data is compared, and then after data processing and analysis, the audit data risk coefficient of the entire project and the sub-item data risk coefficient of each expenditure category in all expenditure categories in the project are obtained;

(4)本发明根据项目的审计数据风险系数的大小提供风险预警，判断审计数据风险系数是否超过预设的阈值，若超过则发出预警信息，提醒信息接收方该项目存在审计风险，并将项目的审计数据风险系数、所有支出类别中各支出类别的分项数据风险系数，以及采用的大数据库中的同类项目数据进行可视化展示；(4) The present invention provides risk early warning according to the size of the audit data risk coefficient of the project, and judges whether the audit data risk coefficient exceeds the preset threshold, and if it exceeds, an early warning message is sent to remind the information receiver that the project has an audit risk, and the project The risk coefficient of audit data, the risk coefficient of sub-item data of each expenditure category in all expenditure categories, and the visual display of similar project data in the large database used;

(5)本发明采用本地计算机处理系统，或者采用云计算平台中的分布式处理器系统，也可将两者结合，能够动态地调整和均衡全系统范围内的不同资源的负荷，从而很好地解决了大规模系统的合理使用与有效管理的问题；(5) The present invention adopts the local computer processing system, or adopts the distributed processor system in the cloud computing platform, also can combine the two, can dynamically adjust and balance the load of different resources in the whole system range, thereby very good It solves the problems of rational use and effective management of large-scale systems;

(6)本发明由于是将大数据中的与待审计项目相同或相似的同类项目数据作为样本数据，同时，这些样本数据组成了一个可以实时更新扩充的子数据库，故作为大数据挖掘的基础数据具有高度的相似性和一致性，从基础上提高了审计数据处理的准确性和可信度；(6) The present invention uses as sample data the same or similar similar project data as the items to be audited in the big data, and these sample data form a sub-database that can be updated and expanded in real time, so as the basis of big data mining The data has a high degree of similarity and consistency, fundamentally improving the accuracy and credibility of audit data processing;

(7)本发明在确定项目的审计数据风险系数之前，根据项目的支出类别信息，比对大数据库中的项目数据后，进行匹配同类项目，并将确认匹配的所有同类项目组成大数据库中的子数据库进行统一的数据处理，这种方法科学合理，在数据处理之前将大数据进行同类项目分类，并组成子数据库，且与待审计项目进行匹配，在计算处理前进行了数据过滤和分类，不仅简化了数据处理的流程，而且大大减轻了数据处理的计算负载；使系统运行速度快、准确性高，减少不必要的干扰；(7) Before the present invention determines the audit data risk coefficient of the project, according to the expenditure category information of the project, after comparing the project data in the large database, match similar projects, and will confirm that all similar projects that match are formed in the large database The sub-database performs unified data processing. This method is scientific and reasonable. Before data processing, big data is classified into similar items to form sub-databases, and matched with items to be audited. Data is filtered and classified before calculation and processing. Not only simplifies the process of data processing, but also greatly reduces the calculation load of data processing; makes the system run fast, with high accuracy, and reduces unnecessary interference;

(8)本发明获取的数据和大数据挖掘结果等相关数据，以及审计案例，均可提供给其他相关系统使用，既可共享数据，又可协同工作。(8) The data acquired by the present invention, big data mining results and other relevant data, as well as audit cases, can be provided to other related systems for use, which can not only share data, but also work together.

附图说明Description of drawings

为了易于说明，本发明由下述的具体实施及附图作以详细描述。For ease of illustration, the present invention is described in detail by the following specific implementations and accompanying drawings.

图1为本发明的方法流程示意图；Fig. 1 is a schematic flow chart of the method of the present invention;

图2为本发明的另一方法流程示意图；Fig. 2 is another method schematic flow chart of the present invention;

图3为本发明的系统结构示意图；Fig. 3 is a schematic structural diagram of the system of the present invention;

图4为本发明的另一系统结构示意图；FIG. 4 is a schematic structural diagram of another system of the present invention;

图5为本发明的计算机可读存储介质示意图；5 is a schematic diagram of a computer-readable storage medium of the present invention;

图6为本发明的计算机装置示意图Fig. 6 is a schematic diagram of a computer device of the present invention

具体实施方式Detailed ways

下面将结合本发明实施例中的图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the figures in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

实施例1Example 1

如图1-2所示，本发明提供了一种审计数据处理方法，在一个具体实施方式中，该方法优选可以通过以下方式进行：As shown in Figure 1-2, the present invention provides a method for processing audit data. In a specific implementation, the method can preferably be performed in the following manner:

根据所述项目的审计数据风险系数，确定是否需要预警；风险系数的确定，是最终决定是否进行预警或者提示的基础。According to the audit data risk coefficient of the project, it is determined whether an early warning is required; the determination of the risk coefficient is the basis for the final decision on whether to give an early warning or a reminder.

在更为具体的实施方式中，是否进行预警，可以通过例如设置一审计数据风险系数的阈值来确定，将之与阈值进行比较。此外，也可以将以往同类型的审计项目中的历史数据及预警情况进行计算，通过例如BP模型等，确定预警点以及对应的预警系数的合理中值，以该中值作为基础，确定一风险系数的参考标准，从而确定是否进行预警。In a more specific implementation manner, whether to issue an early warning may be determined by, for example, setting a threshold of an audit data risk coefficient, and comparing it with the threshold. In addition, it is also possible to calculate the historical data and early warning conditions of the same type of audit projects in the past, determine the reasonable median value of the early warning point and the corresponding early warning coefficient through the BP model, etc., and use the median value as a basis to determine a risk The reference standard of the coefficient, so as to determine whether to carry out early warning.

在更为优选的实施方式中，如果进行预警，也可以基于历史数据，对预警涉及的风险系数的阈值设置为多个，以便根据预警点的不同，将预警级别进行合理的划分，这样，可以使得系统的使用人员能够更加清楚预警的紧急程度或者严重程度，以配合后续的人员等进行风险的处理。In a more preferred embodiment, if an early warning is given, multiple thresholds for the risk coefficients involved in the early warning can also be set based on historical data, so that the early warning levels can be reasonably divided according to the different early warning points. It enables the users of the system to be more aware of the urgency or severity of the warning, so as to cooperate with the follow-up personnel to deal with the risk.

其中，需要预警的项目的审计数据风险系数大于无需预警的项目的审计数据风险系数。此处，作为本发明的一个优选的实施方式来执行，即审计数据风险系数越大，则风险越大，对应数据或项目信息触发预警的可能性越大，或者触发的预警的级别越高。Among them, the audit data risk coefficient of the project that needs early warning is greater than the audit data risk coefficient of the project that does not need early warning. Here, it is implemented as a preferred embodiment of the present invention, that is, the greater the audit data risk factor, the greater the risk, the greater the possibility of triggering an early warning for the corresponding data or project information, or the higher the level of the triggered early warning.

其中，与项目审计相关的各种基础信息和对应数据包括静态数据和动态数据，并且相关数据是动态更新的；大数据库中同类项目的样本数据，在项目审计完成后确认属于正常情况后，归入大数据库中同类项目的子数据库中，以备后续的审计数据处理工作进行大数据挖掘使用；实现在线的动态审计数据处理。Among them, the various basic information and corresponding data related to project audit include static data and dynamic data, and the relevant data is updated dynamically; the sample data of similar projects in the large database, after the project audit is completed and confirmed to be normal, will be returned to the database. Into the sub-database of similar projects in the large database for subsequent audit data processing work for big data mining; realize online dynamic audit data processing.

进一步的，在获取与项目审计相关的各种基础信息和对应数据，以及大数据库中同类项目的样本数据之后，根据项目的支出类别信息，比对大数据库中的项目数据后，进行匹配同类项目，并将确认匹配的所有同类项目组成大数据库中的子数据库进行统一的数据处理。Further, after obtaining various basic information and corresponding data related to project auditing, as well as sample data of similar projects in the large database, according to the expenditure category information of the project, after comparing the project data in the large database, match similar projects , and form sub-databases in the large database for all similar items that are confirmed to be matched for unified data processing.

作为一种较佳的方式，匹配时，采用过滤法(Filter)、包装法(Wrapper) 或嵌入法(Embedded)等进行特征对比筛选，进一步挖掘出审计项目的群体特征与个性权重特征，提高匹配度。As a better way, when matching, use filter method (Filter), wrapper method (Wrapper) or embedding method (Embedded) to carry out feature comparison and screening, and further dig out the group characteristics and personality weight characteristics of audit items to improve matching Spend.

进一步的，所述与项目审计相关的各种基础信息和对应数据，包括：与项目审计有关的项目支出类别、所有支出类别的总预算数据、各支出类别的预算数据及其对应的实际支出数据。Further, the various basic information and corresponding data related to project audit include: project expenditure categories related to project audit, total budget data of all expenditure categories, budget data of each expenditure category and their corresponding actual expenditure data .

进一步的，所述大数据库中同类项目的样本数据，包括：大数据库中同类项目样本的项目支出类别、所有支出类别的总支出数据及各类支出对应的实际数据。Further, the sample data of similar projects in the large database includes: project expenditure categories of similar project samples in the large database, total expenditure data of all expenditure categories, and actual data corresponding to various expenditures.

作为一个更优的实施方式，根据所述项目的审计数据风险系数，确定是否需要预警，是指根据项目的审计数据风险系数的大小是否超过预设的阈值进行判断，若超过则发出预警信息，提醒信息接收方该项目存在审计风险，并将项目的审计数据风险系数、所有支出类别中各支出类别的分项数据风险系数，以及采用的大数据库中的同类项目数据进行可视化展示，比如利用知识图谱可视化技术进行直观展示；其中，所有支出类别中各支出类别的分项数据风险系数，即为项目中第 i类支出的分项数据风险系数；同类项目数据将作为大数据库中的子数据库进行数据处理。As a more optimal implementation, determining whether an early warning is required according to the audit data risk coefficient of the project refers to judging whether the audit data risk coefficient of the project exceeds a preset threshold, and if it exceeds, an early warning message is issued. Remind the recipient of the information that there is an audit risk in the project, and visually display the audit data risk coefficient of the project, the sub-item data risk coefficient of each expenditure category in all expenditure categories, and the data of similar projects in the large database used, such as using knowledge Graph visualization technology for intuitive display; among them, the risk coefficient of sub-item data of each expenditure category in all expenditure categories is the sub-item data risk coefficient of category i expenditure in the project; similar project data will be used as a sub-database in the large database. data processing.

作为一种较佳的实施方式，本实施例在数据采集、数据库筛选匹配、数据处理、输出结果等过程中，采用NoSQL数据库用于基础数据的采集、存储、调用。 NoSQL，泛指非关系型的数据库，NoSQL数据库的产生就是为了解决大规模数据集合多重数据种类带来的挑战，尤其是大数据应用难题，可以为大数据建立快速、可扩展的存储库。传统的关系型数据库中，需要先进行逻辑数据库设置，对每个存储变量进行字符长度、类型设置，它的数据模式是静态的。丽在大数据环境中，数据模式是动态变化的，传统的数据库技术无法解决。同时，对于数据类型的扩增，像文档、报表、图片、音频、视频等数据类型是无法存储在关系型数据库当中的，而这些都将会成为本实施例所需的数据信息，因此需要NoSQL数据库用于数据的采集。As a preferred implementation manner, in the process of data collection, database screening and matching, data processing, output results, etc., this embodiment uses a NoSQL database for basic data collection, storage, and call. NoSQL generally refers to non-relational databases. NoSQL databases are created to solve the challenges brought by multiple data types in large-scale data collections, especially the big data application problems, and can establish fast and scalable storage for big data. In the traditional relational database, it is necessary to set the logical database first, and set the character length and type of each storage variable, and its data mode is static. In the big data environment, the data schema is changing dynamically, which cannot be solved by traditional database technology. At the same time, for the expansion of data types, data types such as documents, reports, pictures, audio, and video cannot be stored in relational databases, and these will become the data information required by this embodiment, so NoSQL is required The database is used for data collection.

作为一种较佳的实施方式，本实施例采用第三方大数据集群或开源的 hadoop大数据集群。As a preferred implementation, this embodiment adopts a third-party big data cluster or an open source hadoop big data cluster.

进一步的，所述项目的审计数据风险系数的确定方法，具体如下：Further, the method for determining the audit data risk coefficient of the project is as follows:

其中，P是项目的审计数据风险系数；Among them, P is the audit data risk coefficient of the project;

i是项目中第i类支出；i is the i-th category of expenditure in the project;

m是项目中所有支出类别的总数；m is the total number of all expenditure categories in the project;

w_i0是项目中第i类支出的预算数据；w _i0 is the budget data of category i expenditure in the project;

W是项目中所有支出类别的总预算数据；W is the total budget data for all expenditure categories in the project;

j是项目所属大数据库中第j个同类项目样本；j is the jth sample of similar projects in the large database to which the project belongs;

n是项目所属大数据库中同类项目样本的总数；n is the total number of similar project samples in the large database to which the project belongs;

q_ij是项目所属大数据库中第j个同类项目样本的第i类支出的数据；q _ij is the i-th category expenditure data of the j-th similar project sample in the large database to which the project belongs;

Q_j是项目所属大数据库中第j个同类项目样本的所有支出类别的总支出数据；Q _j is the total expenditure data of all expenditure categories of the jth similar project sample in the large database to which the project belongs;

w_i是项目中第i类支出的实际支出数据；w _i is the actual expenditure data of category i expenditure in the project;

k是调整系数；k is the adjustment factor;

另外，项目中第i类支出的分项数据风险系数如下：In addition, the risk coefficient of itemized data of category i expenditure in the project is as follows:

其中，P_i是项目中第i类支出的分项数据风险系数；Among them, P _i is the itemized data risk coefficient of expenditure of category i in the project;

注意：审计数据风险系数确定方法中的数据均是项目中的实际数据，或者均是经过预处理之后的数据；数据预处理，指数据清洗、数据转换、数据整合、数据加载均为现有的数据加工处理手段，在此不再对其进行详细赘述。Note: The data in the audit data risk coefficient determination method is the actual data in the project, or the data after preprocessing; data preprocessing refers to data cleaning, data conversion, data integration, and data loading are all existing The data processing means will not be described in detail here.

作为一种较佳的方式，上述整个实施过程中，采用本地计算机处理系统，或者采用云计算平台中的分布式处理器系统，将前端采集的基础数据导入至分布式处理器系统，如分布式数据库或者分布式存储集群，并在导入基础上做清洗或者预处理工作，可以满足海量数据的处理需要，每秒钟的导入量经常可达到百兆，甚至前兆级别。作为更优选的方式，分布式处理器系统整合了动态负载均衡和群组管理调配机制，平台可以实时地监控全系统各个节点的运行状态，动态地调整和均衡全系统范围内的不同资源的负荷，从而很好地解决了大规模系统的合理使用与有效管理的问题。As a preferred method, in the above-mentioned entire implementation process, the local computer processing system or the distributed processor system in the cloud computing platform are used to import the basic data collected by the front end into the distributed processor system, such as distributed Databases or distributed storage clusters, and cleaning or preprocessing on the basis of imports, can meet the processing needs of massive data, and the import volume per second can often reach hundreds of megabytes, or even the level of precursors. As a more preferred method, the distributed processor system integrates dynamic load balancing and group management and allocation mechanisms. The platform can monitor the running status of each node in the whole system in real time, and dynamically adjust and balance the load of different resources in the whole system. , thus well solving the problems of rational use and effective management of large-scale systems.

实施例2Example 2

如图3-4所示，本发明提供了一审计数据处理系统，所述系统包括：As shown in Figure 3-4, the present invention provides an audit data processing system, the system includes:

所述数据采集模块，用于获取与项目审计相关的各种基础信息和对应数据，以及大数据库中同类项目的样本数据；The data collection module is used to obtain various basic information and corresponding data related to project audit, as well as sample data of similar projects in the large database;

存储的数据包括待审计的项目数据、各种项目的样本数据、确认匹配后的所有同类项目组成大数据库中的子数据库，以及预处理数据、中间处理数据、结果数据，可视化展示数据等；The stored data includes project data to be audited, sample data of various projects, sub-databases in the large database composed of all similar projects after confirmation of matching, as well as pre-processing data, intermediate processing data, result data, visual display data, etc.;

作为一种更为优选的实施方式，结果可视化输出模块，能够发出预警信息，提醒信息接收方该项目存在审计风险，并利用知识图谱等可视化技术展示项目的审计数据风险系数、所有支出类别中各支出类别的分项数据风险系数，以及采用的大数据库中的同类项目数据等信息；As a more preferred implementation, the result visualization output module can issue early warning information to remind the information receiver that the project has audit risks, and use visualization technologies such as knowledge graphs to display the project's audit data risk coefficient and all expenditure categories. Itemized data risk factors of expenditure categories, and similar item data in the large database used;

作为更优选的实施方式，当对风险系数的预警设置了不同的级别时，此时的风险系数可比较的阈值可以设置为多个，以方便对风险预警进行合理的分级，当系统以可视化方式进行审计风险的预警时，可以通过例如不同的颜色、提示闪烁的频率等方式进行区分。As a more preferred implementation, when different levels are set for the early warning of the risk coefficient, the comparable thresholds of the risk coefficient at this time can be set to multiple, so as to facilitate the reasonable grading of the risk early warning. When the system visually When pre-warning audit risks, it can be distinguished by means such as different colors and the frequency of flashing prompts.

在更为优选的实施方式中，当设置多个风险系数的比较阈值时，阈值的获取可以基于以往不同类别项目中风险点的历史数据特征进行设置，例如采用合理的历史数据中值，或者通过AI算法，进行合理的分类阈值设置等。In a more preferred embodiment, when setting the comparison thresholds of multiple risk coefficients, the acquisition of the thresholds can be set based on the historical data characteristics of risk points in different types of projects in the past, for example, using a reasonable median value of historical data, or by AI algorithm, reasonable classification threshold setting, etc.

进一步的，所述系统还包括：Further, the system also includes:

数据库样本匹配模块，用于根据项目的支出类别信息，比对大数据库中的项目数据后，进行匹配同类项目，并将确认匹配的所有同类项目组成大数据库中的子数据库进行统一的数据处理。The database sample matching module is used to match similar items after comparing the item data in the large database according to the expenditure category information of the items, and form all similar items that are confirmed to be matched into sub-databases in the large database for unified data processing.

k是调整系数；k is the adjustment factor;

作为一种较佳的方式，上述整个实施过程中，采用本地计算机处理系统，或者采用云计算平台中的分布式处理器系统，将前端采集的基础数据导入至分布式处理器系统，如分布式数据库或者分布式存储集群，并在导入基础上做清洗或者预处理工作，可以满足海量数据的处理需要，每秒钟的导入量经常可达到百兆，甚至前兆级别。优选地，分布式处理器系统整合了动态负载均衡和群组管理调配机制，平台可以实时地监控全系统各个节点的运行状态，动态地调整和均衡全系统范围内的不同资源的负荷，从而很好地解决了大规模系统的合理使用与有效管理的问题。As a preferred method, in the above-mentioned entire implementation process, the local computer processing system or the distributed processor system in the cloud computing platform are used to import the basic data collected by the front end into the distributed processor system, such as distributed Databases or distributed storage clusters, and cleaning or preprocessing on the basis of imports, can meet the processing needs of massive data, and the import volume per second can often reach hundreds of megabytes, or even the level of precursors. Preferably, the distributed processor system integrates dynamic load balancing and group management and allocation mechanisms, and the platform can monitor the running status of each node in the whole system in real time, dynamically adjust and balance the load of different resources in the whole system, so that It solves the problems of rational use and effective management of large-scale systems.

实施例3Example 3

如图5所示，本发明提供一种计算机可读存储介质，其上存储有计算机程序，其特征在于，该程序被处理器执行时实现如上述实施例1所述的审计数据处理方法。As shown in FIG. 5 , the present invention provides a computer-readable storage medium on which a computer program is stored, wherein the program implements the audit data processing method as described in Embodiment 1 above when the program is executed by a processor.

实施例4Example 4

如图6所示，本发明提供一种计算机装置，其特征在于，包括存储器和处理器；所述存储器，用于存储计算机程序；所述处理器，用于执行所述计算机程序时，实现如上述实施例1所述的审计数据处理方法。As shown in Figure 6, the present invention provides a computer device, which is characterized in that it includes a memory and a processor; the memory is used to store a computer program; and the processor is used to execute the computer program to implement the following: The audit data processing method described in Embodiment 1 above.

(8)本发明获取的数据和大数据挖掘结果等相关数据，以及审计案例，均能利用知识图谱等可视化技术进行直观的展示，且提供给其他相关系统使用，既可共享数据，又可协同工作。(8) The data acquired by the present invention, big data mining results and other relevant data, as well as audit cases, can be displayed intuitively by using visualization technologies such as knowledge graphs, and provided to other related systems for use, which can not only share data, but also collaborate Work.

最后应说明的是：以上仅是用以说明本发明技术方案的较佳实施例，而非对其做任何形式上的限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围。Finally, it should be noted that: the above are only preferred embodiments used to illustrate the technical solutions of the present invention, rather than to limit it in any form; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art The skilled person should understand that it is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technology of the embodiments of the present invention. The spirit and scope of the programme.

Claims

1. A method of auditing data processing, the method comprising:

acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in a large database;

determining an audit data risk coefficient of the project according to the various basic information and corresponding data related to project audit and sample data of the same kind of project in the big database, wherein the audit data risk coefficient of the project is used for representing the degree that actual data related to audit in the project objectively deviates from the normal condition of the same kind of project;

determining whether early warning is needed or not according to the audit data risk coefficient of the project;

the risk coefficient of the audit data of the project needing early warning is larger than that of the project needing no early warning;

the method for determining whether early warning is needed according to the audit data risk coefficient of the project further comprises the following steps:

calculating historical data and early warning conditions in the past audit projects of the same type, determining a reasonable median of an early warning point and a corresponding early warning coefficient, and determining a reference standard of a risk coefficient by taking the median as a basis so as to determine whether to perform early warning;

the threshold values of the risk coefficients related to the early warning are set to be multiple, so that the early warning levels are reasonably divided according to different early warning points, namely the higher the risk coefficient of the audit data is, the higher the risk is, the higher the possibility of triggering the early warning corresponding to the data or project information is, or the higher the level of the triggered early warning is;

the method for determining the audit data risk coefficient of the project specifically comprises the following steps:

wherein P is an audit data risk factor for the project;

i is the item class i payout;

m is the total number of all payout categories in the project;

w _i0 budget data for class i spending in the project;

w is the total budget data for all expenditure categories in the project;

j is the jth like item sample in the big database to which the item belongs;

n is the total number of similar project samples in the large database to which the project belongs;

q _ij is the data of the ith class expenditure of the jth similar project sample in the big database to which the project belongs;

Q _j is the total expenditure data of all expenditure categories of the jth similar project sample in the big database to which the project belongs;

w _i actual payout data for class i payouts in the project;

k is an adjustment coefficient.

2. The audit data processing method of claim 1, wherein the method further comprises: after obtaining various basic information and corresponding data related to project auditing and sample data of the same kind of project in a large database,

and matching similar items after comparing the item data in the large database according to the expenditure category information of the items, and forming all the similar items confirmed to be matched into a sub-database in the large database to perform unified data processing.

3. The audit data processing method of claim 1, wherein the various basic information and corresponding data related to project audit includes: project expense categories relating to project audits, total budget data for all expense categories, budget data for each expense category, and actual expense data corresponding thereto.

4. The audit data processing method according to claim 1, wherein the sample data of the same category items in the big database includes: the project expenditure categories of the same type project samples in the big database, the total expenditure data of all the expenditure categories and the actual data corresponding to various types of expenditure.

5. An audit data processing system, the system comprising:

the data acquisition module is used for acquiring various basic information and corresponding data related to project audit and sample data of the same kind of projects in the large database;

the big data storage module is used for storing various data related to auditing and storing and calling the data by each module;

the data processing module is used for determining an audit data risk coefficient of the project according to the various basic information and corresponding data related to the project audit and sample data of the same kind of project in the big database, wherein the audit data risk coefficient of the project is used for representing the degree of objectively deviating from the normal condition of the same kind of project of actual data related to the audit in the project; determining whether early warning is needed or not according to the audit data risk coefficient of the project;

the result visual output module is used for sending out early warning information and displaying various visual display data;

the risk coefficient of audit data of the project needing early warning is larger than that of the project needing no early warning;

wherein P is an audit data risk factor for the project;

i is the item class i payout;

m is the total number of all payout categories in the project;

w _i0 budget data for class i spending in the project;

w is the total budget data for all expense categories in the project;

j is the jth like item sample in the big database to which the item belongs;

w _i actual payout data for class i payouts in the project;

k is an adjustment coefficient.

6. An audit data processing system according to claim 5 wherein the system further includes:

and the database sample matching module is used for matching similar items after comparing the item data in the large database according to the expense category information of the items, and forming all the similar items which are confirmed to be matched into a sub-database in the large database to perform unified data processing.

7. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, carries out the audit data processing method of any of claims 1 to 4.

8. A computer apparatus comprising a memory and a processor; the memory for storing a computer program; the processor, when executing the computer program, implementing an audit data processing method according to any of claims 1-4.