CN116975106A - Data processing method, device and computer readable storage medium - Google Patents

Data processing method, device and computer readable storage medium Download PDF

Info

Publication number
CN116975106A
CN116975106A CN202310716964.5A CN202310716964A CN116975106A CN 116975106 A CN116975106 A CN 116975106A CN 202310716964 A CN202310716964 A CN 202310716964A CN 116975106 A CN116975106 A CN 116975106A
Authority
CN
China
Prior art keywords
card
data
rule
information
reference text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310716964.5A
Other languages
Chinese (zh)
Inventor
李先能
张颖异
初金泽
刘羡楠
蓝泽镕
吴秉霖
邹佳秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202310716964.5A priority Critical patent/CN116975106A/en
Publication of CN116975106A publication Critical patent/CN116975106A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents

Abstract

The embodiment of the application provides a data processing method, a data processing device and a computer readable storage medium, wherein the method comprises the following steps: acquiring card data and reference document data; carrying out structuring treatment on the reference document data to obtain reference document data; detecting card data based on a card rule base to obtain abnormal card information; and detecting the reference text data based on the reference document rule base to obtain abnormal reference text information. According to the embodiment of the application, the card rule base is used for detecting the card data, and the reference document rule base is used for detecting the reference text data, so that a great deal of labor cost is saved, and the accuracy and the efficiency are higher than those of manual screening.

Description

Data processing method, device and computer readable storage medium
Technical Field
The present application relates to the field of information processing, and in particular, to a data processing method, apparatus, and computer readable storage medium.
Background
The existing data screening mode generally integrates data through manual collection, and then compares the data, so that abnormal data screening is realized. However, the above method cannot collect data in large quantities manually, a lot of time and effort are required, and manual screening is low in efficiency and high in cost, so that data screening activities are difficult to complete.
Disclosure of Invention
In view of the above, the present application provides a data processing method, apparatus and computer readable storage medium, which saves a lot of labor cost, and has higher accuracy and efficiency than manual screening, so as to solve the problem that the data screening activity is difficult to be completed, and the specific technical scheme is as follows:
in a first aspect, the present application provides a data processing method, the method comprising:
acquiring card data and reference document data;
carrying out structuring treatment on the reference document data to obtain reference document data;
detecting the card data based on a card rule base to obtain abnormal card information;
and detecting the reference text data based on the reference document rule base to obtain abnormal reference text information.
In one possible implementation manner, the detecting the card data based on the card rule base to obtain abnormal card information includes:
extracting elements from the card data to obtain card elements;
determining a card element rule corresponding to the card element in the card rule base;
and detecting the card data based on the card element rule to obtain the abnormal card information.
In one possible implementation manner, the detecting the card data based on the card element rule to obtain the abnormal card information includes:
acquiring a rule script corresponding to the card element rule based on the card element rule;
and running the rule script based on the card data to obtain the abnormal card information.
In one possible implementation manner, the detecting the card data based on the card rule base to obtain abnormal card information includes:
based on each rule in the card rule base, rule card data corresponding to each rule are obtained from the card data;
and detecting rule card data corresponding to each rule based on each rule to obtain abnormal card information corresponding to each rule.
In one possible implementation manner, the detecting the reference text data based on the reference document rule base to obtain abnormal reference text information includes:
extracting elements from the reference text data to obtain reference text elements;
if the reference text element is preset identification information, acquiring card preset identification information from card data corresponding to the reference text data;
if the preset card identification information is inconsistent with the preset identification information in the reference text data, the preset card identification information is determined to be abnormal card information, and the preset identification information in the reference text data is determined to be abnormal reference text information.
In one possible implementation manner, the structuring the reference document data to obtain reference document data includes:
performing format conversion on the reference document data to obtain a reference document picture;
and carrying out text recognition on the reference document picture to obtain the reference text data.
In a second aspect, the present application also provides a data processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring card data and reference document data;
the processing module is used for carrying out structuring processing on the reference document data to obtain reference document data;
the card detection module is used for detecting the card data based on a card rule base to obtain abnormal card information;
and the reference text detection module is used for detecting the reference text data based on the reference text rule base to obtain abnormal reference text information.
In one possible implementation, the card detection module includes:
the card element extraction unit is used for extracting elements from the card data to obtain card elements;
the rule determining unit is used for determining card element rules corresponding to the card elements in the card rule base;
and the card detection unit is used for detecting the card data based on the card element rule to obtain the abnormal card information.
In one possible implementation manner, the reference text detection module includes:
a reference text element extraction unit, configured to perform element extraction on the reference text data to obtain a reference text element;
the acquisition unit is used for acquiring card preset identification information from card data corresponding to the reference text data if the reference text element is the preset identification information;
and the abnormal information determining unit is used for determining the preset card identification information as abnormal card information and determining the preset identification information in the reference text data as abnormal reference text information if the preset card identification information is inconsistent with the preset identification information in the reference text data.
In a third aspect, the application also provides a computer readable storage medium, characterized in that instructions are stored which, when run on a computer, cause the computer to perform the method of the first aspect or any one of the first aspects.
In the embodiment of the application, card data and reference document data are acquired; carrying out structuring treatment on the reference document data to obtain reference document data; detecting card data based on a card rule base to obtain abnormal card information; and detecting the reference text data based on the reference document rule base to obtain abnormal reference text information. According to the embodiment of the application, the card rule base is used for detecting the card data, and the reference document rule base is used for detecting the reference text data, so that a great deal of labor cost is saved, and the accuracy and the efficiency are higher than those of manual screening. As a rule base which can be updated and added at any time, the rule base can be freely changed, so that the embodiment of the application can effectively and simply add more rules.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data screening activity provided by an embodiment of the present application;
FIG. 2 is a flow chart of a data processing method according to an embodiment of the present application;
FIG. 3 shows a flow chart of data screening provided by an embodiment of the present application;
FIG. 4 shows a schematic diagram of a card rule script code provided by an embodiment of the present application;
FIG. 5 shows a script code schematic diagram of a reference document rule provided by an embodiment of the present application;
FIG. 6 illustrates a schematic diagram of an operating code provided by an embodiment of the present application;
FIG. 7 is a schematic diagram of a data processing system according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The data screening is a process of comparing data generated by a data generator so as to screen abnormal data. In the embodiment of the application, the data management party can carry out data screening, at this time, the data management party obtains the data generated by the data generation party from the data generation party, and compares the data generated by the data generation party with the card data stored by the data management party, so that the abnormal data is obtained. The overall flow of data screening is shown in figure 1. The abnormal data are stored in the form of card data and reference document data together with other normal data in the relevant medium. The reference document data is typically entered into the card after the structured data is formed.
In the prior art, the data are collected manually and then integrated, and then compared, so that the data are screened. However, the prior art cannot perform screening inspection on a large amount of data, has low manual screening efficiency and high cost, and cannot accurately verify the correctness and validity of the screened data, so that the working progress of data screening is not smooth.
Based on the problems, the embodiment of the application utilizes big data and artificial intelligence technology to convert reference document data into structured text data, and then writes preset rules through codes, so that the big data analysis and screening are carried out on the reference document data, the past reference data are checked with high efficiency and high accuracy, the data which definitely do not accord with the preset rules are found, meanwhile, the data which are already recorded in a card are screened through writing rules, and the data of the two parts are screened and checked to see whether possible errors exist, so that the high-efficiency work of data screening is realized.
Referring to fig. 2, a flowchart of an embodiment of a data processing method according to an embodiment of the present application is shown, where the embodiment of the present application at least includes the following steps:
s1, card data and reference document data are acquired.
When data screening is needed, card data and reference document data can be obtained from a local database. As shown in fig. 1 and 3, the card data and the reference document data are stored in the management system. The embodiment of the application can acquire the card data and the reference document data from the management system and store the card data and the reference document data in the local database. In the embodiment of the application, the card data may be one or more card data, and the reference document data may be one or more reference document data.
The card data is generally in the form of Comma-separated values (csv) table, and may include structured information such as name, identification number, serial number, card type, card record data expiration date, and the like.
The reference document data is typically a dmp file, a tag image file format (Tag Image File Format, tif) file or a portable document format (Portable DocumentFormat, pdf) file, and may include a name, an identification card number, a serial number, a reference document type, a reference document origin, and a reference document expiration date.
S2, carrying out structuring processing on the reference document data to obtain the reference document data.
The reference document data belongs to unstructured data, and the reference document data is required to be structured to obtain the reference document data, namely, the conversion from an image medium to a text medium is realized by using an image text recognition technology in computer vision.
The structuring process is performed on the reference document data, and one implementation manner for obtaining the reference document data may include:
s21, carrying out format conversion on the reference document data to obtain a reference document picture;
s22, carrying out text recognition on the reference document picture to obtain reference text data.
The reference document picture is generally in jpg format, that is, the embodiment of the application converts the dmp file, the tif file or the pdf file into a picture file in jpg format. The format conversion may be followed by text recognition by Optical character recognition (Optical CharacterRecognition, OCR) techniques to recognize the reference document picture as text data.
Because more than one picture is included in one reference document, the embodiment of the application combines a plurality of groups of text recognition results of one reference document based on the file name of the reference document data to obtain the reference text data of one reference document.
After the reference text data is obtained, the reference document data, the reference document pictures and the reference text data of each reference document are stored in one reference document file, the reference document file is subjected to batch processing, and the file name of the reference document file is modified to be normative, so that the subsequent searching and comparison are convenient. The reference document files can be stored in a mode of year subfolder-month subfolder-date subfolder-a reference document subfolder of a certain reference document, so that the subsequent searching is convenient.
In the embodiment of the application, the text recognition technology can be utilized to perform text recognition on the reference document picture, and various text recognition technologies exist in practical application, and the text recognition technology is not limited herein.
And S3, detecting card data based on the card rule base to obtain abnormal card information.
After the card data is obtained, the card data can be detected based on the card rule base. As shown in fig. 3, after card data is obtained, the embodiment of the application can analyze information such as name, serial number, card type, card record data validity period and the like from each card data, can formulate a plurality of rules for possibly non-compliant places in the structured card data, and can also utilize programming to write scripts corresponding to the rules after formulating the rules. It should be noted that, the card rule base stores detection rules for card data, and the rules in the card rule base may be formulated in advance.
It should be noted that, the rules in the card rule base may be set according to the actual scenario, and the embodiment of the present application is not limited.
In the embodiment of the application, the abnormal card information can be obtained by traversing all rules in the card rule base based on the card data. Namely, based on the card rule base, detecting card data to obtain abnormal card information, one implementation mode of the abnormal card information can comprise the following steps:
s31, extracting elements from the card data to obtain card elements;
s32, determining card element rules corresponding to the card elements in the card rule base;
and S33, detecting card data based on the card element rule to obtain abnormal card information.
According to the embodiment of the application, element extraction is carried out on each piece of card data to obtain card elements of each piece of card data, then card element rules corresponding to the card elements in the card rule base are determined, and the card data is detected based on the card element rules to obtain abnormal card information. That is, the embodiment of the application can detect each card data without missing the rule related to each card data.
The card element may include serial number information, category information, and date information. The serial number information indicates serial number data in the card data, the category information indicates a card category, and the date information indicates a recording date and an expiration date of data recorded in the card data. It should be noted that the card element may also include other information in the card data, which is not limited by the embodiment of the present application.
The method for detecting the card data based on the card element rule to obtain the abnormal card information comprises the following steps:
s331, acquiring a rule script corresponding to the card element rule based on the card element rule;
s332, running a rule script based on the card data to obtain abnormal card information.
According to the embodiment of the application, the rules in the rule base are compiled into the rule script by using the programming language, so that the abnormal information can be directly obtained by running the rule script, and the efficiency is improved. Fig. 4 shows rule script code for card element rules.
If the card element rule corresponding to the card element cannot be determined from the card rule base, the card data comprising the card element is marked, and when the card rule base is updated subsequently, the card rule base can be updated according to the marked card data.
The embodiment of the application can also traverse all card data based on the rules in the card rule base to obtain abnormal card information. Namely, another implementation method for detecting card data based on the card rule base to obtain abnormal card information may include:
s34, based on each rule in the card rule base, rule card data corresponding to each rule are obtained from the card data;
and S35, detecting rule card data corresponding to each rule based on each rule to obtain abnormal card information corresponding to each rule.
According to the embodiment of the application, the card data are divided according to the rules in the card rule base to obtain the card data corresponding to different rules, and then the card data corresponding to the rules are detected according to the rules to obtain the abnormal card information. For example, the card rule base includes rule 1, rule card data corresponding to rule 1 is obtained from the card data, and then the obtained rule card data is detected based on rule 1, so that abnormal card information corresponding to rule 1 can be obtained.
And S4, detecting the reference text data based on the reference document rule base to obtain abnormal reference text information.
After the reference text data is obtained, the reference text data can be detected based on the reference document rule base. The reference document rule base stores detection rules for reference text data, and it should be noted that the rules in the reference document rule base may be set according to an actual scene, and the embodiment of the present application is not limited.
The embodiment of the application can traverse all rules in the reference document rule base based on the reference text data, thereby obtaining abnormal reference text information. That is, an implementation manner of detecting reference text data based on a reference document rule base to obtain abnormal reference text information in the embodiment of the present application may include:
s41, extracting elements from the reference text data to obtain reference text elements;
s42, if the reference text element is not the preset identification information, determining a reference text element rule corresponding to the reference text element in a reference document rule base;
s43, detecting the reference text data based on the reference text element rule to obtain abnormal reference text information;
s44, if the reference text element is preset identification information, acquiring card preset identification information from card data corresponding to the reference text data;
s45, if the preset identification information of the card is inconsistent with the preset identification information in the reference text data, determining the preset identification information of the card as abnormal card information, and determining the preset identification information in the reference text data as abnormal reference text information.
After the reference text element is obtained in S41, it is determined whether the reference text element is preset identification information, if the reference text element is not the preset identification information, S42 is executed, and if the reference text element is the preset identification information, S44 is executed.
The process of detecting the reference text data based on the reference text element rule to obtain the abnormal reference text information is similar to S33, and reference is made to S33. Fig. 5 shows rule script codes of the rules of the reference document elements, and fig. 6 shows running codes of all rules in the rule base of the reference document, through which all reference text data can be screened for all rules, and whether any rule-specified error exists in all reference documents is checked.
As shown in fig. 3, an embodiment of the present application may define a Python function to lock the text position of a reference text element, which may include serial number information, category information, and date information. The abnormal reference text information may include the name of the reference document and the rule of offence. It should be noted that, the reference text element may further include other information in the reference text data, and the abnormal reference text information may further include other information related to the abnormal reference text data, which is not limited by the embodiment of the present application.
If the reference text element rule corresponding to the reference text element cannot be determined from the reference document rule base, marking the reference text data comprising the reference text element, and updating the reference document rule base according to the marked reference text data when the reference document rule base is updated subsequently.
Because the date information may be represented by Chinese characters or Arabic numerals, the embodiment of the application uses cn2an technology to ensure the accuracy of element extraction when the element extraction of the date information is performed. The cn2an technology is a Python toolkit for quickly converting Chinese numbers and Arabic numbers, and even if characters expressed by Chinese characters are in the middle, the characters can be successfully converted into the Arabic numbers.
The embodiment of the application can also traverse all the reference text data based on the rules in the reference document rule base to obtain the abnormal reference text information. That is, another implementation of detecting reference text data based on a reference document rule base to obtain abnormal reference text information may include:
s46, acquiring rule reference text data corresponding to each rule from the reference text data based on each rule in the reference document rule base;
and S47, detecting rule reference text data corresponding to each rule based on each rule to obtain abnormal reference text information corresponding to each rule.
According to the embodiment of the application, the reference text data is divided according to the rules in the reference document rule base to obtain the reference text data corresponding to different rules, and then the reference text data corresponding to each rule is detected according to each rule to obtain the abnormal reference text information. For example, the reference document rule base includes rule 2, rule reference text data corresponding to rule 2 is obtained from the reference text data, and then the obtained rule reference text data is detected based on rule 2, so that abnormal reference text information corresponding to rule 2 can be obtained.
The embodiment of the application is not limited to the sequence relation among S2, S3 and S4, and can detect the card data and then carry out structuring treatment on the reference document data after the card data and the reference document data are acquired; or after the card data and the reference document data are obtained, firstly carrying out structuring treatment on the reference document data, and then detecting the card data; or detecting the card data after obtaining the reference text data, and then detecting the reference text data; or detecting the reference text data and then detecting the card data after obtaining the reference text data; the card data and the reference text data may also be detected simultaneously after the reference text data is obtained.
After the abnormal card information and the abnormal reference text information are obtained, the embodiment of the application outputs the abnormal card information and the abnormal reference text information to the rechecking terminal, so that rechecking personnel can check the abnormal card information and the abnormal reference text information through the rechecking terminal, and the erroneously screened card data or reference document data are screened out, thereby realizing the screening of the data.
In the embodiment of the application, card data and reference document data are acquired; carrying out structuring treatment on the reference document data to obtain reference document data; detecting card data based on a card rule base to obtain abnormal card information; and detecting the reference text data based on the reference document rule base to obtain abnormal reference text information. According to the embodiment of the application, the card rule base is used for detecting the card data, and the reference document rule base is used for detecting the reference text data, so that a great deal of labor cost is saved, and the accuracy and the efficiency are higher than those of manual screening. As a rule base which can be updated and added at any time, the rule base can be freely changed, so that the embodiment of the application can effectively and simply add more rules.
In order to facilitate further understanding of the technical solution provided by the embodiments of the present application, a data processing system provided by the embodiments of the present application is taken as an example, and a data processing method provided by the embodiments of the present application is described in an overall exemplary manner. FIG. 7 is a schematic diagram of a data processing system according to an embodiment of the present application.
The data processing system focuses on three pieces of content: 1) A data reading module; 2) A rule base construction module; 3) And the data element extraction and identification module.
1) And a data reading module:
the transition from image media to text media is accomplished by using image text recognition techniques in computer vision on text in an access storage medium. And then carrying out rule writing on the reference document data, thereby realizing supervision on the past reference document data, finding out erroneous reference document data, and finding out erroneous data in the card data by carrying out rule writing on the card data.
For reading of structured card data, embodiments of the present application are obtained using current database reading techniques.
For reading the reference text data, firstly, the embodiment of the application realizes the conversion from the dmp file, the tif file and the pdf format file to the jpg format picture file. Secondly, recognition of texts is achieved through OCR text recognition technology after the pictures are converted, and the reference document pictures are recognized into text forms. Then, because the reference document is more than one picture, the embodiment of the application combines a plurality of groups of character recognition results of one reference document based on the file name matching rule.
2) Rule base construction module:
the embodiment of the application constructs a card rule base aiming at the structured card data and a reference document rule base aiming at the reference document text data.
First, rules in a rule base are compiled into rule scripts using a programming language. Secondly, an abstract module script is constructed aiming at the serial number recognition module, the category recognition module, the date recognition module and the reference document batch processing. The resulting rule base is then stored as a callable program form. The programming language may be Python language or JAVA language, and the embodiment of the present application does not limit the programming language. The reference document batch processing is to batch process the reference document file, modify the file name of the reference document file, make the file name accord with the standard format, facilitate the subsequent searching and comparing, and facilitate the back searching when the subsequent reference document is output.
3) The data element extraction and identification module:
firstly, based on a serial number recognition module, a category recognition module, a date matching module, a judgment book batch processing module and the like, the error checking is carried out on various aspects such as whether a preset mark exists in a reference document, whether the date is reasonable or not and the like. And then, outputting the corresponding reference document which is screened by each rule and does not meet the specification, wherein the output document contains the name of the reference document and the corresponding offending rule, so that the output result is conveniently and directly provided for rechecking staff. Finally, the rechecking personnel performs manual checking, and error data is manually screened after checking, so that the screening of past error data is realized.
The embodiment of the application writes the rules through codes, which guarantees the screening fairness by utilizing big data and artificial intelligence technology, and is the practice of screening the big data; the reference document file is identified by the OCR technology, so that the reference document stored in the form of a picture can be completely integrated into a text form, and the application of various subsequent technologies is convenient; the written rule, the error reference document obtained by screening and the corresponding rule are output through codes, so that a rechecker can easily understand the rules and can carry out a data correction program through the output result; the fairness of data screening is guaranteed through an artificial intelligence technology, and the data screening process can be supervised. Because of adopting a big data technology and an artificial intelligence technology, the screening effect of the embodiment of the application is better, the accuracy is higher than that of manual screening, and meanwhile, the speed is high, and a large amount of manpower is saved; for a program written according to rules, the program can be updated at any time and added at any time and can be freely changed, which means that the method of the embodiment of the application can effectively, simply and conveniently add more rules; the embodiment of the application has universality for all data, can be easily applied to other fields, and is convenient for recheckers in all fields to check the past data.
Next, a description will be given of a data processing apparatus provided by the present application, and a data processing apparatus described below and a data processing method described above can be referred to correspondingly to each other.
Referring to fig. 8, a schematic structural diagram of a data processing apparatus provided by the present application is shown, where the apparatus includes:
an acquiring module 801, configured to acquire card data and reference document data;
a processing module 802, configured to perform a structuring process on the reference document data to obtain reference document data;
the card detection module 803 is configured to detect the card data based on a card rule base, so as to obtain abnormal card information;
the reference text detection module 804 is configured to detect the reference text data based on a reference document rule base, and obtain abnormal reference text information.
In an embodiment of the present application, the card detection module 803 includes:
the card element extraction unit is used for extracting elements from the card data to obtain card elements;
the rule determining unit is used for determining card element rules corresponding to the card elements in the card rule base;
and the card detection unit is used for detecting the card data based on the card element rule to obtain the abnormal card information.
In an embodiment of the present application, the card detection unit includes:
a script acquisition subunit, configured to acquire a rule script corresponding to the card element rule based on the card element rule;
and the script operation subunit is used for operating the rule script based on the card data to obtain the abnormal card information.
In an embodiment of the present application, the card detection module 803 includes:
a data acquisition unit, configured to acquire rule card data corresponding to each rule from the card data based on each rule in the card rule base;
and the data detection unit is used for detecting rule card data corresponding to each rule based on each rule to obtain abnormal card information corresponding to each rule.
In an embodiment of the present application, the reference text detection module 804 includes:
a reference text element extraction unit, configured to perform element extraction on the reference text data to obtain a reference text element;
the acquisition unit is used for acquiring card preset identification information from card data corresponding to the reference text data if the reference text element is the preset identification information;
and the abnormal information determining unit is used for determining the preset card identification information as abnormal card information and determining the preset identification information in the reference text data as abnormal reference text information if the preset card identification information is inconsistent with the preset identification information in the reference text data.
In an embodiment of the present application, the processing module 802 includes:
the format conversion unit is used for carrying out format conversion on the reference document data to obtain a reference document picture;
and the text recognition unit is used for carrying out text recognition on the reference document picture to obtain the reference text data.
The present application also provides a computer readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method described in the method embodiments above.
In the embodiment of the application, an acquisition module is used for acquiring card data and reference document data; the processing module is used for carrying out structuring processing on the reference document data to obtain reference document data; the card detection module is used for detecting card data based on the card rule base to obtain abnormal card information; and the reference text detection module is used for detecting the reference text data based on the reference text rule base to obtain abnormal reference text information. According to the embodiment of the application, the card rule base is used for detecting the card data, and the reference document rule base is used for detecting the reference text data, so that a great deal of labor cost is saved, and the accuracy and the efficiency are higher than those of manual screening. As a rule base which can be updated and added at any time, the rule base can be freely changed, so that the embodiment of the application can effectively and simply add more rules.
It should be noted that, in each embodiment, identical and similar parts are referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
For the foregoing embodiments, for simplicity of explanation, the same is shown as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
Finally, it is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method of data processing, the method comprising:
acquiring card data and reference document data;
carrying out structuring treatment on the reference document data to obtain reference document data;
detecting the card data based on a card rule base to obtain abnormal card information;
and detecting the reference text data based on the reference document rule base to obtain abnormal reference text information.
2. The method of claim 1, wherein the detecting the card data based on the card rule base to obtain abnormal card information comprises:
extracting elements from the card data to obtain card elements;
determining a card element rule corresponding to the card element in the card rule base;
and detecting the card data based on the card element rule to obtain the abnormal card information.
3. The method of claim 2, wherein detecting the card data based on the card element rule to obtain the abnormal card information comprises:
acquiring a rule script corresponding to the card element rule based on the card element rule;
and running the rule script based on the card data to obtain the abnormal card information.
4. The method of claim 1, wherein the detecting the card data based on the card rule base to obtain abnormal card information comprises:
based on each rule in the card rule base, rule card data corresponding to each rule are obtained from the card data;
and detecting rule card data corresponding to each rule based on each rule to obtain abnormal card information corresponding to each rule.
5. The method according to claim 1, wherein the detecting the reference text data based on the reference document rule base to obtain abnormal reference text information includes:
extracting elements from the reference text data to obtain reference text elements;
if the reference text element is preset identification information, acquiring card preset identification information from card data corresponding to the reference text data;
if the preset card identification information is inconsistent with the preset identification information in the reference text data, the preset card identification information is determined to be abnormal card information, and the preset identification information in the reference text data is determined to be abnormal reference text information.
6. The method according to any one of claims 1 to 5, wherein the structuring the reference document data to obtain reference document data includes:
performing format conversion on the reference document data to obtain a reference document picture;
and carrying out text recognition on the reference document picture to obtain the reference text data.
7. A data processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring card data and reference document data;
the processing module is used for carrying out structuring processing on the reference document data to obtain reference document data;
the card detection module is used for detecting the card data based on a card rule base to obtain abnormal card information;
and the reference text detection module is used for detecting the reference text data based on the reference text rule base to obtain abnormal reference text information.
8. The apparatus of claim 7, wherein the card detection module comprises:
the card element extraction unit is used for extracting elements from the card data to obtain card elements;
the rule determining unit is used for determining card element rules corresponding to the card elements in the card rule base;
and the card detection unit is used for detecting the card data based on the card element rule to obtain the abnormal card information.
9. The apparatus of claim 7, wherein the reference text detection module comprises:
a reference text element extraction unit, configured to perform element extraction on the reference text data to obtain a reference text element;
the acquisition unit is used for acquiring card preset identification information from card data corresponding to the reference text data if the reference text element is the preset identification information;
and the abnormal information determining unit is used for determining the preset card identification information as abnormal card information and determining the preset identification information in the reference text data as abnormal reference text information if the preset card identification information is inconsistent with the preset identification information in the reference text data.
10. A computer readable storage medium storing instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 6.
CN202310716964.5A 2023-06-16 2023-06-16 Data processing method, device and computer readable storage medium Pending CN116975106A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310716964.5A CN116975106A (en) 2023-06-16 2023-06-16 Data processing method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310716964.5A CN116975106A (en) 2023-06-16 2023-06-16 Data processing method, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116975106A true CN116975106A (en) 2023-10-31

Family

ID=88470294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310716964.5A Pending CN116975106A (en) 2023-06-16 2023-06-16 Data processing method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116975106A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189063A (en) * 2019-07-02 2019-08-30 山东鸿业信息科技有限公司 Case quality previewing system
CN111899132A (en) * 2020-08-28 2020-11-06 四川省广安市人民检察院 Method for automatically identifying case not setting up case within specified period
CN114444477A (en) * 2021-07-29 2022-05-06 北京法意科技有限公司 Administrative law enforcement case quality supervision method and system
CN115330168A (en) * 2022-08-08 2022-11-11 复旦大学 Data lineage-based inspection service flow abnormity detection method
CN115470177A (en) * 2021-06-11 2022-12-13 中国移动通信集团重庆有限公司 File processing method, device, equipment and computer storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189063A (en) * 2019-07-02 2019-08-30 山东鸿业信息科技有限公司 Case quality previewing system
CN111899132A (en) * 2020-08-28 2020-11-06 四川省广安市人民检察院 Method for automatically identifying case not setting up case within specified period
CN115470177A (en) * 2021-06-11 2022-12-13 中国移动通信集团重庆有限公司 File processing method, device, equipment and computer storage medium
CN114444477A (en) * 2021-07-29 2022-05-06 北京法意科技有限公司 Administrative law enforcement case quality supervision method and system
CN115330168A (en) * 2022-08-08 2022-11-11 复旦大学 Data lineage-based inspection service flow abnormity detection method

Similar Documents

Publication Publication Date Title
CN107239666B (en) Method and system for desensitizing medical image data
CN106485243B (en) A kind of bank slip recognition error correction method and device
CN109726783A (en) A kind of invoice acquisition management system and method based on OCR image recognition technology
Li et al. GFTE: graph-based financial table extraction
US20140207631A1 (en) Systems and Method for Analyzing and Validating Invoices
CN111444793A (en) Bill recognition method, equipment, storage medium and device based on OCR
CN110110325B (en) Repeated case searching method and device and computer readable storage medium
Walton et al. A cost analysis of transcription systems
CN113568934B (en) Data query method and device, electronic equipment and storage medium
CN111341405B (en) Medical data processing system and method
CN116975106A (en) Data processing method, device and computer readable storage medium
CN116384344A (en) Document conversion method, device and storage medium
CN111382710A (en) Drawing comparison method based on image recognition
CN114970490A (en) Text labeling data quality inspection method and device, electronic equipment and storage medium
CN115756486A (en) Data interface analysis method and device
KR101800975B1 (en) Sharing method and apparatus of the handwriting recognition is generated electronic documents
CN105718972B (en) A kind of information intelligent acquisition method
Pattnaik et al. A Framework to Detect Digital Text Using Android Based Smartphone
CN113986823A (en) Picture archiving method, device, medium and equipment for communication machine room
Panditpautra et al. Biometric Attendance Management System Using Raspberry Pi
CN113806311A (en) Deep learning-based file classification method and device, electronic equipment and medium
CN112925874A (en) Similar code searching method and system based on case marks
CN112990110A (en) Method for extracting key information from research report and related equipment
Karambelkar et al. Automated Text Extraction from Images using Optical Character Recognition.
CN114821029A (en) OCR technology-based distribution network operation security ring identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination