CN111914543A - Report legality detection method, device, electronic device and readable storage medium - Google Patents
Report legality detection method, device, electronic device and readable storage medium Download PDFInfo
- Publication number
- CN111914543A CN111914543A CN202010569830.1A CN202010569830A CN111914543A CN 111914543 A CN111914543 A CN 111914543A CN 202010569830 A CN202010569830 A CN 202010569830A CN 111914543 A CN111914543 A CN 111914543A
- Authority
- CN
- China
- Prior art keywords
- report
- detected
- result information
- information
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 100
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000005516 engineering process Methods 0.000 claims abstract description 20
- 230000006870 function Effects 0.000 claims description 11
- 238000007619 statistical method Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 5
- 230000001788 irregular Effects 0.000 abstract description 8
- 238000003058 natural language processing Methods 0.000 abstract 1
- 238000012795 verification Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 108010025037 T140 peptide Proteins 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Technology Law (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请提供了一种报表合法性检测方法、装置、电子设备及可读存储介质,应用于自然语言处理技术领域,其中该方法包括:通过自然语言理解技术从从不规则的待检测报表中提取检测要素,并将检测要素的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,从而确定待检测报表的合法性,从而实现了待检测报表合法性的自动检测,提升了报表合法性检测的效率,同时提升检测的准确性。
The present application provides a method, device, electronic device, and readable storage medium for detecting the validity of a report, which are applied to the technical field of natural language processing, wherein the method includes: extracting from irregular reports to be detected by natural language understanding technology Detect elements, and check the consistency between the statistical result information of the detection elements and the corresponding total result information in the report to be inspected, so as to determine the legitimacy of the report to be inspected, thereby realizing the automatic detection of the legitimacy of the report to be inspected, improving the The efficiency of report validity detection is improved, and the detection accuracy is improved.
Description
技术领域technical field
本申请涉及自然语言理解技术领域,具体而言,本申请涉及一种报表合法性检测方法、装置、电子设备及可读存储介质。The present application relates to the technical field of natural language understanding, and in particular, the present application relates to a method, apparatus, electronic device, and readable storage medium for detecting the validity of a report.
背景技术Background technique
国际银行卡组织(如银联、万事达、jcb、大莱、美国运通、VISA)通常每日均会和会员银行发生资金往来业务,并提供资金报表。基于银行合规性要求,会员银行通常需要对报表内容的合法性进行校验,因报表通常为不规则格式的方便人工阅读的样式,目前报表的合法性校验通常采用人工校验的方式。然而,人工校验的方式,存在耗时费力以及容易出错的问题。International bank card organizations (such as UnionPay, MasterCard, jcb, Diners Club, American Express, VISA) usually conduct fund transactions with member banks every day and provide fund statements. Based on bank compliance requirements, member banks usually need to verify the legitimacy of the report content. Because the report is usually in an irregular format that is convenient for human reading, the current legality verification of the report usually adopts the method of manual verification. However, the manual verification method is time-consuming, labor-intensive and prone to errors.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种报表合法性检测方法、装置、电子设备及可读存储介质,用于提供一种报表合法性检测方法,实现报表合法性的自动化检测,提升检测的效率,降低检测的人工成本,同时提升检测的准确性。本申请采用的技术方案如下:The present application provides a method, device, electronic device and readable storage medium for detecting the legality of a report, which are used to provide a method for detecting the legality of a report, realize the automatic detection of the legality of the report, improve the efficiency of the detection, and reduce the labor of the inspection. cost, while improving the accuracy of detection. The technical scheme adopted in this application is as follows:
第一方面,提供了一种报表合法性检测方法,该方法包括,In a first aspect, a method for detecting the legitimacy of a report is provided, and the method includes:
获取待检测报表,待检测报表的数据为非结构化数据;Obtain the report to be tested, and the data of the report to be tested is unstructured data;
基于自然语言理解技术提取待检测报表的多个检测要素信息;Extract multiple detection element information of the report to be detected based on natural language understanding technology;
对提取的多个检测要素信息进行统计,得到统计结果信息;Statistical statistics are performed on the extracted multiple detection element information to obtain statistical result information;
将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。The obtained statistical result information is checked for consistency with the corresponding total result information in the report to be checked, and the validity of the report to be checked is determined based on the consistency check result.
可选地,所述待检测报表为TXT格式或PDF格式或XML格式文件。Optionally, the report to be detected is a file in TXT format, PDF format or XML format.
可选地,对提取的多个检测要素信息进行统计,得到统计结果信息,之前包括:Optionally, perform statistics on the extracted multiple detection element information to obtain statistical result information, which previously includes:
将提取得到的多个检测要素信息添加至EXCEL表中;Add the extracted multiple detection element information to the EXCEL table;
基于EXCEL表的统计分析功能确定多个检测要素信息的统计结果信息。The statistical analysis function based on EXCEL table determines the statistical result information of multiple detection element information.
可选地,该方法还包括:Optionally, the method further includes:
确定多个检测要素信息的至少一个类别信息;determining at least one category information of a plurality of detection element information;
确定各个类别对应的检测要素的统计结果信息;Determine the statistical result information of the detection elements corresponding to each category;
所述将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,包括:The obtained statistical result information and the corresponding total result information in the report to be detected are checked for consistency, including:
将各个类别的统计结果信息分别与待检测报表中的对应的各个类别的合计结果信息进行一致性校验。The statistical result information of each category is respectively checked for consistency with the corresponding aggregated result information of each category in the report to be detected.
第二方面,提供了一种报表合法性检测装置,该装置包括,In a second aspect, a report validity detection device is provided, and the device includes:
,用于获取待检测报表,所述待检测报表的数据为非结构化数据;, which is used to obtain a report to be detected, and the data of the report to be detected is unstructured data;
提取模块,用于基于自然语言理解技术提取待检测报表的多个检测要素信息;The extraction module is used to extract multiple detection element information of the report to be detected based on natural language understanding technology;
统计模块,用于对提取的多个检测要素信息进行统计,得到统计结果信息;The statistics module is used to perform statistics on the extracted multiple detection element information to obtain statistical result information;
校验模块,用于将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。The verification module is used to perform consistency verification between the obtained statistical result information and the corresponding aggregated result information in the report to be checked, and to determine the validity of the report to be checked based on the consistency check result.
可选地,所述待检测报表为TXT格式或PDF格式或XML格式文件。Optionally, the report to be detected is a file in TXT format, PDF format or XML format.
可选地,该装置还包括:Optionally, the device also includes:
添加模块,用于将提取得到的多个检测要素信息添加至EXCEL表中;Add a module to add the extracted multiple detection element information to the EXCEL table;
第一确定模块,用于基于EXCEL表的统计分析功能确定多个检测要素信息的统计结果信息。The first determination module is configured to determine statistical result information of a plurality of detection element information based on the statistical analysis function of the EXCEL table.
可选地,该装置还包括:Optionally, the device also includes:
第二确定模块,用于确定多个检测要素信息的至少一个类别信息;a second determining module, configured to determine at least one category information of a plurality of detection element information;
第三确定模块,用于确定各个类别对应的检测要素的统计结果信息;The third determination module is used to determine the statistical result information of the detection elements corresponding to each category;
校验模块,具体用于将各个类别的统计结果信息分别与待检测报表中的对应的各个类别的合计结果信息进行一致性校验。The verification module is specifically configured to perform consistency verification between the statistical result information of each category and the corresponding aggregated result information of each category in the report to be detected.
第三方面,提供了一种电子设备,该电子设备包括:In a third aspect, an electronic device is provided, the electronic device comprising:
一个或多个处理器;one or more processors;
存储器;memory;
一个或多个应用程序,其中一个或多个应用程序被存储在存储器中并被配置为由一个或多个处理器执行,一个或多个程序配置用于:执行第一方面所示的报表合法性检测方法。One or more application programs, wherein the one or more application programs are stored in memory and configured to be executed by one or more processors, and the one or more programs are configured to: execute the report shown in the first aspect. Sex detection method.
第四方面,提供了一种计算机可读存储介质,计算机存储介质用于存储计算机指令,当其在计算机上运行时,使得计算机可以执行第一方面所示的报表合法性检测方法。In a fourth aspect, a computer-readable storage medium is provided. The computer storage medium is used to store computer instructions, which, when executed on a computer, enable the computer to execute the report validity detection method shown in the first aspect.
本申请提供了一种报表合法性检测方法、装置、电子设备及可读存储介质,与现有技术通过人工方式实现报表的检测相比,本申请通过获取待检测报表,所述待检测报表的数据为非结构化数据;基于自然语言理解技术提取待检测报表的多个检测要素信息;对提取的多个检测要素信息进行统计,得到统计结果信息;将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。即通过自然语言理解技术从从不规则的待检测报表中提取检测要素,并将检测要素的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,从而确定待检测报表的合法性,从而实现了待检测报表合法性的自动检测,提升了报表合法性检测的效率,同时提升检测的准确性。The present application provides a method, device, electronic device and readable storage medium for detecting the validity of a report. Compared with the manual method of detecting reports in the prior art, the present application obtains reports to be detected, and the details of the reports to be detected are obtained. The data is unstructured data; multiple detection element information of the report to be detected is extracted based on natural language understanding technology; statistics are performed on the extracted multiple detection element information to obtain statistical result information; Consistency check is performed on the corresponding aggregated result information, and the validity of the report to be checked is determined based on the consistency check result. That is, the detection elements are extracted from the irregular reports to be detected by natural language understanding technology, and the statistical result information of the detected elements is checked for consistency with the corresponding total result information in the reports to be detected, so as to determine the status of the reports to be detected. The legality of the report can be automatically detected, thereby improving the efficiency of the report's legality detection and improving the accuracy of the detection.
本申请附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本申请的实践了解到。Additional aspects and advantages of the present application will be set forth in part in the following description, which will become apparent from the following description, or may be learned by practice of the present application.
附图说明Description of drawings
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:
图1为本申请实施例的一种报表合法性检测方法的流程示意图;1 is a schematic flowchart of a method for detecting the validity of a report according to an embodiment of the application;
图2为本申请实施例的一种报表合法性检测装置的结构示意图;2 is a schematic structural diagram of a report validity detection device according to an embodiment of the application;
图3为本申请实施例的一种报表检测流程示例图;Fig. 3 is a kind of report form detection flow example diagram of the embodiment of the application;
图4为本申请实施例的一种电子设备的结构示意图;4 is a schematic structural diagram of an electronic device according to an embodiment of the application;
图5为卡组织报表的示例;Figure 5 is an example of a card organization report;
图6为对提取的检测要素处理后的样式。FIG. 6 is a pattern after processing the extracted detection elements.
具体实施方式Detailed ways
下面详细描述本申请的实施例,各实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能解释为对本申请的限制。Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present application, but not to be construed as a limitation on the present application.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。It will be understood by those skilled in the art that the singular forms "a," "an," and "the" as used herein can include the plural forms as well, unless expressly stated otherwise. It should be further understood that the word "comprising" used in the specification of this application refers to the presence of features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combination of one or more of the associated listed items.
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。The technical solutions of the present application and how the technical solutions of the present application solve the above-mentioned technical problems will be described in detail below with specific examples. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present application will be described below with reference to the accompanying drawings.
本申请实施例提供了一种报表合法性检测方法,如图1所示,该方法可以包括以下步骤:The embodiment of the present application provides a method for detecting the validity of a report. As shown in FIG. 1 , the method may include the following steps:
步骤S101,获取待检测报表,所述待检测报表的数据为非结构化数据;Step S101, obtaining a report to be detected, the data of the report to be detected is unstructured data;
其中,所述待检测报表为TXT格式或PDF格式或XML格式文件。该类文本的数据无法对数据直接统计计算。Wherein, the report to be detected is a file in TXT format, PDF format or XML format. The data of this type of text cannot be calculated directly on the data.
步骤S102,基于自然语言理解技术提取待检测报表的多个检测要素信息;Step S102, extracting multiple detection element information of the report to be detected based on the natural language understanding technology;
步骤S103,对提取的多个检测要素信息进行统计,得到统计结果信息;Step S103, performing statistics on the extracted multiple detection element information to obtain statistical result information;
具体地,通过自然语言理解技术提取检测要素,将不可进行统计分析的非结构化数据转化为可统计分析的结构化数据。Specifically, the detection elements are extracted through natural language understanding technology, and the unstructured data that cannot be statistically analyzed is converted into structured data that can be statistically analyzed.
步骤S104,将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。In step S104, the obtained statistical result information and the corresponding aggregated result information in the report to be checked are checked for consistency, and the validity of the report to be checked is determined based on the consistency check result.
具体地,如果提取的检测要素的统计结果信息与检测报表中对应的合计结果信息一致,则说明报表是正确的(合法的),如果不一致,则说明报表存在错误,报表是不合法的。如果报表不合法,可以将相应错误信息进行标注,用于提示工作人员。Specifically, if the statistical result information of the extracted detection elements is consistent with the corresponding aggregated result information in the detection report, it means that the report is correct (legal); if not, it means that there is an error in the report and the report is illegal. If the report is illegal, the corresponding error information can be marked to remind the staff.
本申请实施例提供了一种报表合法性检测方法,与现有技术需人工对报表的合法性进行校验相比,本申请实施例通过获取待检测报表,所述待检测报表的数据为非结构化数据;基于自然语言理解技术提取待检测报表的多个检测要素信息;对提取的多个检测要素信息进行统计,得到统计结果信息;将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。即通过自然语言理解技术从从不规则的待检测报表中提取检测要素,并将检测要素的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,从而确定待检测报表的合法性,从而实现了待检测报表合法性的自动检测,提升了报表合法性检测的效率,同时提升检测的准确性。The embodiment of the present application provides a method for detecting the validity of a report. Compared with the prior art that requires manual verification of the legality of the report, the embodiment of the present application obtains the report to be detected, and the data of the report to be detected is not Structured data; extract multiple detection element information of the report to be detected based on natural language understanding technology; perform statistics on the extracted multiple detection element information to obtain statistical result information; The total result information is checked for consistency, and the validity of the report to be checked is determined based on the consistency check result. That is, the detection elements are extracted from the irregular reports to be detected by natural language understanding technology, and the statistical result information of the detected elements is checked for consistency with the corresponding total result information in the reports to be detected, so as to determine the status of the reports to be detected. The legality of the report can be automatically detected, thereby improving the efficiency of the report's legality detection and improving the accuracy of the detection.
本申请实施例提供了一种可能的实现方式,The embodiment of the present application provides a possible implementation manner,
本申请实施例提供了一种可能的实现方式,对提取的多个检测要素信息进行统计,得到统计结果信息,之前包括:The embodiment of the present application provides a possible implementation manner to perform statistics on the extracted multiple detection element information to obtain statistical result information, which includes:
将提取得到的多个检测要素信息添加至EXCEL表中;Add the extracted multiple detection element information to the EXCEL table;
基于EXCEL表的统计分析功能确定多个检测要素信息的统计结果信息。The statistical analysis function based on EXCEL table determines the statistical result information of multiple detection element information.
如可以直接通过EXCEL表的求和功能,计算多个检测要素的求和数据,然后再和待检测报表中的对应的合计数进行比对,如果一致,则说明报表合法,不一致,则说明报表不合法。For example, the summation data of multiple detection elements can be calculated directly through the summation function of the EXCEL table, and then compared with the corresponding totals in the report to be detected. If they are consistent, the report is legal, and the report is inconsistent. illegal.
本申请实施例提供了一种可能的实现方式,进一步地,该方法还包括:The embodiment of the present application provides a possible implementation manner, and further, the method further includes:
确定多个检测要素信息的至少一个类别信息;determining at least one category information of a plurality of detection element information;
确定各个类别对应的检测要素的统计结果信息;Determine the statistical result information of the detection elements corresponding to each category;
具体地,提取的的检测要素可以是多个类别的检测要素,如不同产品的相关要素。可以分别计算每个类别的产品的要素的合计数据,然后和待检测报表中的各个产品的合计要素数据进行比对,如果一致,则说明报表合法,如果不一致,则说明报表不合法。Specifically, the extracted detection elements may be detection elements of multiple categories, such as related elements of different products. The total data of the elements of each category of products can be calculated separately, and then compared with the total element data of each product in the report to be tested. If they are consistent, the report is legal; if they are inconsistent, the report is illegal.
所述将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,包括:The obtained statistical result information and the corresponding total result information in the report to be detected are checked for consistency, including:
将各个类别的统计结果信息分别与待检测报表中的对应的各个类别的合计结果信息进行一致性校验。The statistical result information of each category is respectively checked for consistency with the corresponding aggregated result information of each category in the report to be detected.
图3示出了待检测报表合法性检测的一个流程示例图,其中,该待检测报表可以是万事达卡组织报表。图5示出了一个卡组织报表的示例,其中,万事达报表特指T140文件报表,报表关键要素:包含了交易的请款类型,请款日期,金额(包含借贷方向),交易备注、手续费金额(包含借贷方向)、清算日期、币种等。FIG. 3 shows an example flow chart of the validity of the report to be detected, wherein the report to be detected may be a MasterCard organization report. Figure 5 shows an example of a card organization report, in which the Mastercard report refers specifically to the T140 file report. The key elements of the report include the type of payment request of the transaction, the date of payment, the amount (including the loan direction), transaction remarks, and handling fee. Amount (including lending direction), settlement date, currency, etc.
图6示出了对提取的检测要素处理后的样式,可使用excel的计算公式进行快速校验合法性或其他计算,Figure 6 shows the processed style of the extracted detection elements, which can be used to quickly verify the legality or other calculations by using the calculation formula of excel.
其中,待检测报表可以是TXT或PDF或XML报表,通过自然语言,提取交易要素,得到图6所示的报表,可以通过不同栏位来筛选和统计交易笔数,从而可以多维度校验报表合法性。比如可以查看Total是否确实等于前面几行相加的数据,可以分别对Count、不同的Amount、Fee进行合法性校验。对报表的自洽性进行验证。Among them, the report to be detected can be a TXT, PDF or XML report. Through natural language, the transaction elements are extracted to obtain the report shown in Figure 6. The number of transactions can be filtered and counted through different fields, so that the report can be verified in multiple dimensions. legality. For example, you can check whether Total is indeed equal to the data added in the previous lines, and you can check the validity of Count, different Amounts, and Fees respectively. Verify the self-consistency of the report.
其检测流程如下:The detection process is as follows:
步骤1:输入卡组织报表文件,进入步骤2Step 1: Enter the card to organize the report file, go to
步骤2:根据需提取内容,组织关键字提取的正则表达式3.Step 2: Extract content as needed, organize regular expressions for keyword extraction 3.
步骤3:按顺序进行正则表达式结果匹配,进入步骤4。Step 3: Match the regular expression results in sequence, and go to Step 4.
步骤4:如果存在查找结果,则进入步骤5,否则进入步骤步骤2.Step 4: If there is a search result, go to Step 5, otherwise go to
步骤5:将匹配结果的关键要素摘取,并记录相关要素,进入步骤6。Step 5: Extract the key elements of the matching result, record the relevant elements, and go to step 6.
步骤6:判断当前匹配带查找文本是否已位于文档末尾。如是,则进入步骤7,否则进入步骤2.Step 6: Determine whether the search text of the current matching band is located at the end of the document. If yes, go to step 7, otherwise go to
步骤7:对所有记录进行统计分析,并与原始文本报表进行比对,进入步骤8.Step 7: Statistical analysis of all records, and comparison with the original text report, go to
步骤8:如比对成功,则报表合法,进入步骤9,如比对失败,则报表非法,进入步骤9.Step 8: If the comparison is successful, the report is legal, go to Step 9, if the comparison fails, the report is illegal, and go to Step 9.
步骤9:结束。Step 9: End.
图2为本申请实施例提供的一种报表合法性检测装置,该装置20包括:获取模块201、提取模块202、统计模块203以及校验模块204,其中,FIG. 2 is a report validity detection device provided by an embodiment of the present application. The
获取模块201,用于获取待检测报表,所述待检测报表的数据为非结构化数据;The obtaining
提取模块202,用于基于自然语言理解技术提取待检测报表的多个检测要素信息;The
统计模块203,用于对提取的多个检测要素信息进行统计,得到统计结果信息;The
校验模块204,用于将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。The verification module 204 is configured to perform consistency verification between the obtained statistical result information and the corresponding aggregated result information in the report to be checked, and to determine the validity of the report to be checked based on the consistency check result.
本申请实施例提供了一种报表合法性检测装置,与现有技术需人工对报表的合法性进行校验相比,本申请实施例通过获取待检测报表,所述待检测报表的数据为非结构化数据;基于自然语言理解技术提取待检测报表的多个检测要素信息;对提取的多个检测要素信息进行统计,得到统计结果信息;将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。即通过自然语言理解技术从从不规则的待检测报表中提取检测要素,并将检测要素的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,从而确定待检测报表的合法性,从而实现了待检测报表合法性的自动检测,提升了报表合法性检测的效率,同时提升检测的准确性。The embodiment of the present application provides an apparatus for detecting the validity of a report. Compared with the prior art, which requires manual verification of the validity of the report, the embodiment of the present application obtains the report to be checked, and the data of the report to be checked is not Structured data; extract multiple detection element information of the report to be detected based on natural language understanding technology; perform statistics on the extracted multiple detection element information to obtain statistical result information; The total result information is checked for consistency, and the validity of the report to be checked is determined based on the consistency check result. That is, the detection elements are extracted from the irregular reports to be detected by natural language understanding technology, and the statistical result information of the detected elements is checked for consistency with the corresponding total result information in the reports to be detected, so as to determine the status of the reports to be detected. The legality of the report can be automatically detected, thereby improving the efficiency of the report's legality detection and improving the accuracy of the detection.
可选地,所述待检测报表为TXT格式或PDF格式或XML格式文件。Optionally, the report to be detected is a file in TXT format, PDF format or XML format.
可选地,该装置还包括:Optionally, the device also includes:
添加模块,用于将提取得到的多个检测要素信息添加至EXCEL表中;Add a module to add the extracted multiple detection element information to the EXCEL table;
第一确定模块,用于基于EXCEL表的统计分析功能确定多个检测要素信息的统计结果信息。The first determination module is configured to determine statistical result information of a plurality of detection element information based on the statistical analysis function of the EXCEL table.
可选地,该装置还包括:Optionally, the device also includes:
第二确定模块,用于确定多个检测要素信息的至少一个类别信息;a second determining module, configured to determine at least one category information of a plurality of detection element information;
第三确定模块,用于确定各个类别对应的检测要素的统计结果信息;The third determination module is used to determine the statistical result information of the detection elements corresponding to each category;
校验模块,具体用于将各个类别的统计结果信息分别与待检测报表中的对应的各个类别的合计结果信息进行一致性校验。The verification module is specifically configured to perform consistency verification between the statistical result information of each category and the corresponding aggregated result information of each category in the report to be detected.
本实施例的报表合法性检测装置可执行本申请上述实施例中提供的一种报表合法性检测方法,其实现原理相类似,此处不再赘述。The report validity detection device in this embodiment can execute a report validity detection method provided in the above-mentioned embodiments of the present application, and the implementation principle thereof is similar, and details are not described herein again.
本申请实施例提供了一种报表合法性检测装置,与现有技术需人工对报表的合法性进行校验相比,本申请实施例通过获取待检测报表,所述待检测报表的数据为非结构化数据;基于自然语言理解技术提取待检测报表的多个检测要素信息;对提取的多个检测要素信息进行统计,得到统计结果信息;将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。即通过自然语言理解技术从从不规则的待检测报表中提取检测要素,并将检测要素的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,从而确定待检测报表的合法性,从而实现了待检测报表合法性的自动检测,提升了报表合法性检测的效率,同时提升检测的准确性。The embodiment of the present application provides an apparatus for detecting the validity of a report. Compared with the prior art, which requires manual verification of the validity of the report, the embodiment of the present application obtains the report to be checked, and the data of the report to be checked is not Structured data; extract multiple detection element information of the report to be detected based on natural language understanding technology; perform statistics on the extracted multiple detection element information to obtain statistical result information; The total result information is checked for consistency, and the validity of the report to be checked is determined based on the consistency check result. That is, the detection elements are extracted from the irregular reports to be detected by natural language understanding technology, and the statistical result information of the detected elements is checked for consistency with the corresponding total result information in the reports to be detected, so as to determine the status of the reports to be detected. The legality of the report can be automatically detected, thereby improving the efficiency of the report's legality detection and improving the accuracy of the detection.
本申请实施例提供了一种报表合法性检测装置,适用于上述实施例所示的方法,在此不再赘述。The embodiment of the present application provides a report validity detection device, which is applicable to the method shown in the foregoing embodiment, and details are not described herein again.
本申请实施例提供了一种电子设备,如图4所示,图4所示的电子设备40包括:处理器401和存储器403。其中,处理器401和存储器403相连,如通过总线402相连。进一步地,电子设备40还可以包括收发器404。需要说明的是,实际应用中收发器404不限于一个,该电子设备40的结构并不构成对本申请实施例的限定。其中,处理器401应用于本申请实施例中,用于实现图2所示模块的功能。收发器404包括接收机和发射机。An embodiment of the present application provides an electronic device. As shown in FIG. 4 , the
处理器401可以是CPU,通用处理器,DSP,ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器401也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。The
总线402可包括一通路,在上述组件之间传送信息。总线402可以是PCI总线或EISA总线等。总线402可以分为地址总线、数据总线、控制总线等。为便于表示,图4中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The
存储器403可以是ROM或可存储静态信息和指令的其他类型的静态存储设备,RAM或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM、CD-ROM或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。The
存储器403用于存储执行本申请方案的应用程序代码,并由处理器401来控制执行。处理器401用于执行存储器403中存储的应用程序代码,以实现图2所示实施例提供的报表合法性检测装置的功能。The
本申请实施例提供了一种电子设备,与现有技术需人工对报表的合法性进行校验相比,本申请实施例通过获取待检测报表,所述待检测报表的数据为非结构化数据;基于自然语言理解技术提取待检测报表的多个检测要素信息;对提取的多个检测要素信息进行统计,得到统计结果信息;将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。即通过自然语言理解技术从从不规则的待检测报表中提取检测要素,并将检测要素的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,从而确定待检测报表的合法性,从而实现了待检测报表合法性的自动检测,提升了报表合法性检测的效率,同时提升检测的准确性。The embodiment of the present application provides an electronic device. Compared with the prior art, which requires manual verification of the validity of the report, the embodiment of the present application obtains the report to be detected, and the data of the report to be detected is unstructured data. ; Extract multiple detection element information of the report to be detected based on natural language understanding technology; perform statistics on the extracted multiple detection element information to obtain statistical result information; Consistency check is performed, and the validity of the report to be tested is determined based on the consistency check result. That is, the detection elements are extracted from the irregular report to be inspected by natural language understanding technology, and the statistical result information of the inspection elements is checked for consistency with the corresponding total result information in the report to be inspected, so as to determine the content of the report to be inspected. The legality of the report can be automatically detected, thereby improving the efficiency of the report's legality detection and improving the accuracy of the detection.
本申请实施例提供了一种电子设备适用于上述方法实施例。在此不再赘述。The embodiments of the present application provide an electronic device suitable for the above method embodiments. It is not repeated here.
本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现上述实施例中所示的方法。Embodiments of the present application provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the program is executed by a processor, the methods shown in the foregoing embodiments are implemented.
本申请实施例提供了一种计算机可读存储介质,与现有技术需人工对报表的合法性进行校验相比,本申请实施例通过获取待检测报表,所述待检测报表的数据为非结构化数据;基于自然语言理解技术提取待检测报表的多个检测要素信息;对提取的多个检测要素信息进行统计,得到统计结果信息;将得到的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,基于一致性校验结果确定待检测报表的合法性。即通过自然语言理解技术从从不规则的待检测报表中提取检测要素,并将检测要素的统计结果信息与待检测报表中的对应的合计结果信息进行一致性校验,从而确定待检测报表的合法性,从而实现了待检测报表合法性的自动检测,提升了报表合法性检测的效率,同时提升检测的准确性。The embodiment of the present application provides a computer-readable storage medium. Compared with the prior art that requires manual verification of the validity of the report, the embodiment of the present application obtains the report to be detected, and the data of the report to be detected is not Structured data; extract multiple detection element information of the report to be detected based on natural language understanding technology; perform statistics on the extracted multiple detection element information to obtain statistical result information; The total result information is checked for consistency, and the validity of the report to be checked is determined based on the consistency check result. That is, the detection elements are extracted from the irregular reports to be detected by natural language understanding technology, and the statistical result information of the detected elements is checked for consistency with the corresponding total result information in the reports to be detected, so as to determine the status of the reports to be detected. The legality of the report can be automatically detected, thereby improving the efficiency of the report's legality detection and improving the accuracy of the detection.
本申请实施例提供了一种计算机可读存储介质适用于上述方法实施例。在此不再赘述。The embodiments of the present application provide a computer-readable storage medium suitable for the foregoing method embodiments. It is not repeated here.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.
以上仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above are only part of the embodiments of the present application. It should be pointed out that for those skilled in the art, some improvements and modifications can be made without departing from the principles of the present application. These improvements and modifications should also be regarded as The protection scope of this application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010569830.1A CN111914543A (en) | 2020-06-20 | 2020-06-20 | Report legality detection method, device, electronic device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010569830.1A CN111914543A (en) | 2020-06-20 | 2020-06-20 | Report legality detection method, device, electronic device and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111914543A true CN111914543A (en) | 2020-11-10 |
Family
ID=73237813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010569830.1A Pending CN111914543A (en) | 2020-06-20 | 2020-06-20 | Report legality detection method, device, electronic device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111914543A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117479A (en) * | 2018-08-13 | 2019-01-01 | 数据地平线(广州)科技有限公司 | A kind of financial document intelligent checking method, device and storage medium |
CN110555212A (en) * | 2019-09-06 | 2019-12-10 | 北京金融资产交易所有限公司 | Document verification method and device based on natural language processing and electronic equipment |
CN110909226A (en) * | 2019-11-28 | 2020-03-24 | 达而观信息科技(上海)有限公司 | Financial document information processing method and device, electronic equipment and storage medium |
WO2020113401A1 (en) * | 2018-12-04 | 2020-06-11 | 北京比特大陆科技有限公司 | Data detection method, apparatus and device |
-
2020
- 2020-06-20 CN CN202010569830.1A patent/CN111914543A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117479A (en) * | 2018-08-13 | 2019-01-01 | 数据地平线(广州)科技有限公司 | A kind of financial document intelligent checking method, device and storage medium |
WO2020113401A1 (en) * | 2018-12-04 | 2020-06-11 | 北京比特大陆科技有限公司 | Data detection method, apparatus and device |
CN110555212A (en) * | 2019-09-06 | 2019-12-10 | 北京金融资产交易所有限公司 | Document verification method and device based on natural language processing and electronic equipment |
CN110909226A (en) * | 2019-11-28 | 2020-03-24 | 达而观信息科技(上海)有限公司 | Financial document information processing method and device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
MISHIDEMUDONG: ""核字"、"核数"、"核逻辑" ——NLP助力智能金融文档核查", pages 2 - 11, Retrieved from the Internet <URL:https://blog.csdn.net/u010159842/article/details/100101904> * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263024B (en) | Data processing method, terminal device and computer storage medium | |
CN103391311B (en) | A kind of multi-platform between the method and system of consistency verification of data | |
WO2019047636A1 (en) | Withdrawal request review method, apparatus, electronic device and storage medium | |
CN107545422A (en) | A kind of arbitrage detection method and device | |
CN111241161A (en) | Invoice information mining method and device, computer equipment and storage medium | |
CN111767350A (en) | Data warehouse testing method, device, terminal equipment and storage medium | |
CN108009223B (en) | Method and device for detecting consistency of transaction data | |
CN109598599A (en) | A kind of refund processing method, device and equipment based on block chain | |
WO2020253065A1 (en) | Qualification appraisal method and apparatus based on data analysis, and server | |
US11138372B2 (en) | System and method for reporting based on electronic documents | |
CN116521662A (en) | Method, device, equipment and medium for detecting effect of data cleaning | |
CN110956166A (en) | Bill labeling method and device | |
CN112346993B (en) | A test method, device and equipment for an intelligence analysis engine | |
CN110046979A (en) | One kind converting account method and apparatus | |
CN112150260A (en) | Method, system, equipment and medium for verifying authenticity of business information of manufacturing enterprise | |
CN111914543A (en) | Report legality detection method, device, electronic device and readable storage medium | |
CN111932142A (en) | Method, device, equipment and storage medium for scheme grouping and data grouping | |
CN112270486A (en) | Data quality evaluation method and device, electronic equipment and readable medium | |
CN109409091B (en) | Method, device and equipment for detecting Web page and computer storage medium | |
CN111611242A (en) | A Method for Importing Excel Data into Database | |
CN111897844A (en) | Report validity detection method and device based on granularity information and electronic equipment | |
CN113779947A (en) | Method, device, device and storage medium for automatically generating accounting vouchers | |
US20130218587A1 (en) | Coverage Discovery | |
CN112700322B (en) | Order sampling detection method, order sampling detection device, electronic equipment and storage medium | |
CN110673888B (en) | Verification method and device for configuration file |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220914 Address after: 25 Financial Street, Xicheng District, Beijing 100033 Applicant after: CHINA CONSTRUCTION BANK Corp. Address before: 25 Financial Street, Xicheng District, Beijing 100033 Applicant before: CHINA CONSTRUCTION BANK Corp. Applicant before: Jianxin Financial Science and Technology Co.,Ltd. |