WO2021196935A1 - Data checking method and apparatus, electronic device, and storage medium - Google Patents

Data checking method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2021196935A1
WO2021196935A1 PCT/CN2021/078082 CN2021078082W WO2021196935A1 WO 2021196935 A1 WO2021196935 A1 WO 2021196935A1 CN 2021078082 W CN2021078082 W CN 2021078082W WO 2021196935 A1 WO2021196935 A1 WO 2021196935A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
field name
type
image file
data
Prior art date
Application number
PCT/CN2021/078082
Other languages
French (fr)
Chinese (zh)
Inventor
刘振涛
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021196935A1 publication Critical patent/WO2021196935A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • This application relates to the field of data processing technology, in particular to data-based verification methods, devices, electronic equipment and storage media.
  • a data verification method includes: obtaining the business type of the target business and the image files that need to be verified by the target business; determining the file type of the image file according to the image file, and according to the business type and the image
  • the file type of the file determines the file type of the target verification file that needs to be verified, and the data source identifier of the target verification file and the file identifier of the target verification file are determined according to the image file, wherein the target verification file It is a file for verifying image files; input the business type and the file type of the image file into the pre-trained first machine learning model, and output that the image file corresponding to the file type needs to be verified
  • the first field name of the pre-trained first machine learning model is obtained by training sample data including the business type, the file type of the image file, and the first field name in the image file that needs to be verified; input the image
  • the pre-trained second machine learning model uses the business type, the file type of the target verification file, and the target verification file to verify the first
  • the field value data in the field name is obtained by training the sample data of the second field name for verifying the field value data, and the field value data in the second field name is used to verify the field value data in the first field name; Acquiring the field value data in the first field name according to the first field name, and acquiring the target verification file according to the data source information of the target verification file and the file identifier of the target verification file;
  • the field value data in the first field name is verified based on the field value data in the second field name in the target verification file.
  • a data verification device includes: a first acquisition unit, used to acquire the business type of a target business and the image file that the target business needs to be verified; a first execution unit, used to determine the image file according to the image file
  • the file type of the target verification file to be verified is determined according to the service type and the file type of the image file, and the data source identification and target verification of the target verification file are determined according to the image file.
  • the file identification of the verification file wherein the target verification file is a file for verifying the image file;
  • the second execution unit is used to input the service type and the file type of the image file to the pre-trained first In the machine learning model, the first field name that needs to be verified in the image file corresponding to the file type is output, and the pre-trained first machine learning model includes the service type, the file type of the image file, and the image
  • the sample data of the first field name in the file that needs to be verified is obtained through training;
  • the third execution unit is used to input the file type of the image file, the service type, the file type of the target verification file, and the From the first field name to the second pre-trained machine learning model, output the second field name obtained in the target verification file for verifying the field value data in the first field name, and the pre-training
  • the second machine learning model is obtained by training the sample data containing the business type, the file type of the target verification file, and the second field name in the target verification file that verifies the field value data in
  • the verification unit is used to obtain the target verification file based on the data in the target verification file
  • the field value data in the second field name is verified against the field value data in the first field name.
  • An electronic device includes a memory and a processor, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes the following steps:
  • the file type of the target verification file, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, wherein the target verification file is a file for verifying the image file
  • the target verification file is a file for verifying the image file
  • the pre-trained first machine learning model is obtained by training the sample data containing the service type, the file type of the image file, and the first field name in the image file that needs to be verified; input the file type of the image file, the service type , The file type of the target verification file and the first field name to the pre-trained second machine learning model, and output the data of the field value in the first field name in the target verification
  • the pre-trained second machine learning model collates the field value data in the first field name by including the business type, the file type of the target verification file, and the target verification file.
  • the sample data of the second field name of the verification is obtained through training, and the field value data in the second field name is used to verify the field value data in the first field name; the obtained data is obtained according to the first field name.
  • the field value data in the second field name is verified against the field value data in the first field name.
  • a storage medium storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the file type of the target verification file, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, wherein the target verification file is a file for verifying the image file
  • the target verification file is a file for verifying the image file
  • the pre-trained first machine learning model is obtained by training the sample data containing the service type, the file type of the image file, and the first field name in the image file that needs to be verified; input the file type of the image file, the service type , The file type of the target verification file and the first field name to the pre-trained second machine learning model, and output the data of the field value in the first field name in the target verification
  • the pre-trained second machine learning model collates the field value data in the first field name by including the business type, the file type of the target verification file, and the target verification file.
  • the sample data of the second field name of the verification is obtained through training, and the field value data in the second field name is used to verify the field value data in the first field name; the obtained data is obtained according to the first field name.
  • the field value data in the second field name is verified against the field value data in the first field name.
  • This application can quickly and accurately verify each image file.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
  • Fig. 2 is a flowchart of a data verification method shown in an exemplary embodiment of the application.
  • FIG. 3 is a specific flowchart of step S220 of the data verification method shown in an exemplary embodiment of the application.
  • Fig. 4 is a flowchart of a data verification method shown in an exemplary embodiment of the application.
  • Fig. 5 is a flowchart of a data verification method shown in an exemplary embodiment of the application.
  • Fig. 6 is a block diagram of a data verification device shown in an exemplary embodiment of the present application.
  • Fig. 7 is an exemplary block diagram of an electronic device for implementing the foregoing data verification method according to an exemplary embodiment of the present application.
  • Fig. 8 shows a computer-readable storage medium for implementing the above-mentioned data verification method according to an exemplary embodiment of the present application.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
  • the system architecture may include a client (as shown in FIG. 1, one or more of the smart phone 101, the tablet computer 102, and the portable computer 103, of course, it may also be a desktop computer, etc.), a network 104 And server 105.
  • the network 104 is a medium used to provide a communication link between the client and the server 105.
  • the network 104 may include various connection types, such as wired communication links, wireless communication links, and so on.
  • the numbers of clients, networks, and servers in FIG. 1 are merely illustrative. There can be any number of clients, networks, and servers according to implementation needs.
  • the server 105 may be a server cluster composed of multiple servers. The user can use the client to interact with the server 105 through the network 104 to receive or send messages, etc.
  • the server 105 can be a server that provides various services, such as a server that provides a data verification service.
  • the client obtains the business type of the target business and the image files that need to be verified by the target business; determines the file type of the image file according to the image file, and determines the need according to the business type and the file type of the image file
  • the file type of the target verification file to be verified, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, where the target verification file is the file for verifying the image file;
  • the pre-trained first machine learning model contains There are business types, image file file types, and image files that need to be verified by the sample data training of the first field name; the file type of the input image file, the business type, the file type of the target verification file, and the first field name
  • the output is obtained in the target verification file to verify the field value data in the first field name.
  • the pre-trained second machine learning model contains the business The type, the file type of the target verification file and the sample data of the second field name in the target verification file that verify the field value data in the first field name are obtained by training, and the field value data in the second field name is used for Verify the field value data in the first field name, and obtain the field value data in the first field name according to the first field name, and obtain it according to the data source information of the target verification file and the file identification of the target verification file Target verification file; verify the field value data in the first field name based on the field value data in the second field name in the target verification file.
  • the pre-trained first machine learning model it is possible to quickly determine the first field name in each image file that needs to be verified according to the business type of the target business and the image file that needs to be verified.
  • the field value data in other field names that do not need to be verified are verified; the second machine learning model can be used to verify the file type of the image file, the business type, the file type of the target file, and the first field through the pre-trained second machine learning model Determine the name of the second field in the target verification file that needs to be verified against the field value data in the first field name, so as to quickly and accurately determine the verification file that needs to be verified and the verification file that needs to be verified.
  • the field value data in one field name is effectively checked for the field value data in the second field name.
  • the data verification method provided in the embodiments of the present application is generally executed by the client, and correspondingly, the data verification device is generally set in the client.
  • the server 105 may also have similar functions as the client, so as to execute the solution of the data verification method provided in the embodiments of the present application. The implementation details of the technical solutions of the embodiments of the present application will be described in detail below.
  • FIG. 2 is a flowchart of a data verification method shown in an exemplary embodiment of this application.
  • the execution subject of the data verification method in this embodiment is the client, as shown in FIG. 1 It may include the following steps S210 to S260, which are described in detail as follows.
  • step S210 the service type of the target service and the image file that needs to be verified for the target service are obtained.
  • the target business refers to a specific business that the user can handle.
  • different business types such as insurance policy loan, mortgage loan, and personal housing loan can be used.
  • the image file that needs to be verified by the target business is used as the image file that needs to be verified when the user enters the business.
  • the user can enter the type of business handled and the image file that needs to be verified through the virtual button provided on the business handling page of the client.
  • the number of image files can be One or more, the number of image files can be determined according to the actual needs of handling the business.
  • step S220 the file type of the image file is determined according to the image file, the file type of the target verification file that needs to be verified is determined according to the service type and the file type of the image file, and the file type of the target verification file to be verified is determined according to the The image file determines the data source identification of the target verification file and the file identification of the target verification file, wherein the target verification file is a file for verifying the image file.
  • the file type of the image file refers to the file type determined after the image file is recognized.
  • the file types of image files are different.
  • the file types of image files can be ID cards, insurance policies, real estate certificates, mortgage contracts, etc.
  • the file type can be determined based on the character data contained in the image file.
  • FIG. 3 is a specific flowchart of step S220 of the data verification method shown in an exemplary embodiment of the application.
  • Step S220 may include step S310 to step S320, which are described in detail as follows.
  • Step S310 Perform OCR character recognition on the image file to obtain recognized text information.
  • the image file when the file type of the image file is determined according to the image file, the image file may be subjected to OCR character recognition to obtain the recognized text information.
  • the recognized text information refers to the recognition of all character data in the image file The character data collection obtained afterwards.
  • the character data set includes the character string corresponding to each field name in the image file and the character string corresponding to the field value data in each field name.
  • the character string corresponding to the field name is "Insured Name", “Insurance amount”, “Insurance company name”, “Insurance policy number”, etc.
  • the field value data in the field name of "Insurant name” corresponds to the string "Zhang San”
  • the string corresponding to the field value data can be "10000.00”
  • the string corresponding to the field value data in the field name "Insurance Company Name” can be "Ping An Insurance Company of China”
  • the character string corresponding to the field value data can be "5485426232".
  • Step S320 Determine the file type of the image file according to the key field name included in the recognized text information.
  • the image files can be classified based on the key field names with differences, so as to determine the file type of the image file. For example, in a loan scenario, for a certain image file, if the recognized text information after recognizing the image file contains "name of applicant", "name of insurance company", "policy number” and "type of insurance".
  • the four key field names can determine the file type of the image file as the insurance policy. It should be pointed out that the key field name generally identifies the specific field name in the image file.
  • the specific field name can be one or multiple. The number of specific field names can be determined according to the actual classification situation. .
  • the verification file is a file for verifying the character data contained in the image file, where the file type of the verification file is the same as the file type and service type of the image file.
  • the mapping relationship between the type and the service type determines the file type of the verification file used to verify the image file.
  • the image files that need to be verified include the insurance policy image file, the ID card image file, and the loan note image file input by the user.
  • the verification file for verifying each image file, for the policy image file input by the user in the policy loan business it can be determined according to the mapping relationship that the real policy file of the insurance company needs to be used to perform the verification on the policy image file input by the user.
  • the data source identification may specifically be the identification information of the external data server or the local data server storing the verification file, and the file identification of the verification file is used as the unique identification information for identifying the verification file, such as a data ticket number. .
  • the data source identification and verification of the verification file can be determined according to the character data in the image file. The file ID of the file.
  • OCR character recognition can be performed on the policy image file input by the user to obtain the recognized text information, where the recognized text information includes all the character data in the policy image file, and the recognized text information contains the "insurance company name"
  • the field value data "Ping An Insurance Company of China” in this field name is used as the data source identification of the policy document to be verified
  • the field value data "5485426232" in the field name of the recognized text information "insurance policy number” is used as The document identification of the insurance policy document to be verified, thereby facilitating the acquisition of the insurance policy document to be verified according to the data source identification and the document identification of the insurance policy document.
  • step S230 input the service type and the file type of the image file into the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type
  • the pre-trained first machine learning model is obtained by training the sample data including the service type, the file type of the image file, and the name of the first field in the image file that needs to be verified.
  • the first field is the name of the field in the image file that needs to be verified for the field value data in the field name. It should be pointed out that when handling different services, the image file type There will be differences in the field names that need to be verified in files and image files. There is an association between the field names that need to be verified in image files, the type of business handled and the file type of the image file. For example, in the policy loan business in the loan scenario, when the business handled is the policy loan business and the image file input by the user is an ID card image file, the first field that needs to be verified on the ID card image file is named "name" And "ID number", that is, only the field value data in the two first field names of "name" and "ID number” need to be verified.
  • the business type of the target business that needs to be processed and the file of each image file input by the user can be The type is input into the pre-trained first machine learning model, and the first field name that needs to be verified in each image file input by the user is determined.
  • the field names that need to be verified can be all the field names contained in the image file, and of course, they can also be part of the field names contained in the image file.
  • FIG. 4 is a flowchart of a data verification method shown in an exemplary embodiment of this application, which may include steps S410 to S420, which are described in detail as follows.
  • step S410 the training set sample data used for training the first machine learning model to be trained is obtained, and each piece of sample data in the training set sample data includes the business type, the file type of the image file, and the image file. The name of the first field to be checked.
  • the pre-trained first machine learning model is obtained by training the machine learning model through training sample data.
  • the first machine learning model may be a CNN (Convolutional Neural Network, convolutional neural network) model, or may also be a deep neural network model.
  • the specific training process of the first machine learning model is as follows: Obtain the training set sample data used for training. Each piece of sample data in the training set sample data includes the business type of the existing target business, and the existing target business needs to be verified. The file type of each image file and the name of the first field that needs to be verified in each image file.
  • step S420 the first machine learning model to be trained is trained using the training set sample data to obtain the first machine learning model after training.
  • the first machine learning model is trained based on the acquired training set sample data to obtain the trained first machine learning model.
  • FIG. 5 is a flowchart of a data verification method shown in an exemplary embodiment of this application, which may include steps S510 to S530, which are described in detail as follows.
  • step S510 obtain test set sample data used for verifying the trained first machine learning model, and each piece of sample data in the test set sample data includes the business type, the file type of the image file, and the image file The name of the first field to be checked in.
  • the trained first machine learning model can also be verified through test sample data.
  • the test set sample data can be obtained.
  • Each piece of sample data in the test set sample data also includes the business type of the existing target business, the file type of each image file that needs to be verified by the existing target business, and each image file The name of the first field to be checked in.
  • step S520 the service type of each sample data and the file type of the image file of the test set sample data are input to the first machine learning model after training, and the first machine learning model that needs to be verified is outputted from the predicted image file.
  • a field name A field name.
  • step S530 if the first field name in the image file in the test set sample data that needs to be verified is the same as the first field name in the predicted image file that needs to be verified, the number of sample data pieces is all the same. If the proportion of the total number of sample data in the test set sample data exceeds a predetermined proportion threshold, the trained first machine learning model is identified as the pre-trained first machine learning model.
  • the training set sample data If in the training set sample data, it is known that the field names that need to be verified in the image files under this file type are the same as the predicted field names in the image files under this file type.
  • the number of sample data pieces If the proportion of the number of sample data in the training set sample data exceeds the predetermined proportion threshold, the verification has passed, otherwise, the verification has not passed, and the first machine learning model needs to continue to be trained until the verification passes.
  • step S240 input the file type of the image file, the service type, the file type of the target verification file, and the first field name to the pre-trained second machine learning
  • the second field name for verifying the field value data in the first field name in the target verification file is output
  • the pre-trained second machine learning model contains business types
  • the file type of the target verification file and the sample data of the second field name in the target verification file for verifying the field value data in the first field name are obtained by training, and the field value data in the second field name is used for Verify the field value data in the first field name.
  • the target verification will be caused. There will be differences in the name of the second field that needs to be verified in the verification file.
  • the first field name corresponding to the value data of each field in the image file of the loan note that needs to be verified includes “lender name” and “loan” Personal ID” and “Lender’s mobile phone number”.
  • the second field name used to verify the field value data in the first field name in the insurance policy includes "insurant name", " "Insured's ID card” and "Insured's mobile phone number”.
  • the field value data in the second field name of "Insurant Name” is used to verify the field value data in the first field name of "Lender Name", and the second field name of "Insurant ID card”
  • the field value data in the "Lender ID” is used to verify the field value data in the first field name, and the field value data in the second field name "Insured’s mobile phone number”
  • the field value data in the first field name of "phone number” is checked.
  • the second machine learning model may be a CNN (Convolutional Neural Network, convolutional neural network) model or a deep neural network model.
  • the sample data for training of the second machine learning model includes the business type, the file type of the verification file, and the sample data of the second field name that is used to verify the field value data in the first field name in the verification data.
  • the field value data in the second field name is used to verify the field value data in the first field name. Since the training process of the pre-trained second machine learning model is similar to the pre-trained first machine learning model, we will not repeat it .
  • step S250 the field value data in the first field name is obtained according to the first field name, and the data source information of the target check file and the file identifier of the target check file are obtained.
  • Target verification file
  • the field value data in the first field name that needs to be verified can be obtained according to the corresponding character data in the first field name image file , As the field value data for verification.
  • the target server that needs to obtain the target verification file can be determined according to the data source information of the target verification file, and the target server can be obtained from the server storing the target verification file according to the file identifier. The required target verification file.
  • step S260 the field value data in the first field name is verified based on the field value data in the second field name in the target verification file.
  • the field value data in the second field name of the image file is verified against the field value data in the first field name that needs to be verified.
  • verify the field value data in the first field name that needs to be verified in the image file to ensure that it can be accurate for each image file
  • Local verification improves the accuracy of verification; in addition, verification is only performed on the field value data in the first field name that needs to be verified in the image file, and it can also avoid all field names contained in the image file.
  • the field value data in are all verified, which improves the efficiency of verification.
  • the pre-trained first machine learning model can quickly determine the name of the first field in each image file that needs to be verified according to the business type of the target business and the image file that needs to be verified, and then you can Avoid verifying the field value data in other field names in the image file that does not need to be verified;
  • the second machine learning model can be used to verify the file according to the file type, business type, and target of the image file through the pre-trained second machine learning model
  • the type and the first field name determine the second field name in the target verification file that needs to be verified for the field value data in the first field name, so as to quickly and accurately determine the verification file that needs to be verified and the verification
  • the field value data in the second field name that needs to be effectively checked for the field value data in the first field name in the file, while ensuring the accuracy of the verification result, realizes the rapid and accurate verification of each image file
  • even in the context of business types and multiple image files only the training data of the pre-trained machine learning model needs to be adjusted
  • step S250 it may further include the step of: obtaining field value data in the second field name in the target verification file and verifying the field value data in the first field name.
  • the verification result of displays the verification result.
  • the verification result When the verification result is displayed, the verification result can be imported into the corresponding display document template according to the text type of the verification file and the corresponding relationship between the file type of the image text input by the user and the display document template to generate Display documents for display to facilitate and more intuitively view the corresponding verification results.
  • FIG. 6 is a block diagram of a data verification device shown in an exemplary embodiment of the present application.
  • the data verification device 600 may be integrated in the above-mentioned client, and may specifically include a first acquiring unit 610 and a second acquiring unit 610.
  • the first obtaining unit 610 is used to obtain the service type of the target service and the image file for which the target service needs to be verified; the first execution unit 620 is used to determine the file type of the image file according to the image file, and Determine the file type of the target verification file to be verified according to the service type and the file type of the image file, and determine the data source identifier of the target verification file and the file identifier of the target verification file according to the image file , wherein the target verification file is a file for verifying an image file; the second execution unit 630 is configured to input the service type and the file type of the image file into the pre-trained first machine learning model , Output to obtain the first field name that needs to be verified in the image file corresponding to the file type, and the pre-trained first machine learning model contains the business type, the file type of the image file, and the image file that needs to be checked.
  • the sample data of the first field name for verification is obtained through training; the third execution unit 640 is used to input the file type of the image file, the service type, the file type of the target verification file, and the first field Name to the second pre-trained machine learning model, output the second field name obtained in the target verification file for verifying the field value data in the first field name, and the pre-trained second
  • the machine learning model is obtained by training the sample data containing the business type, the file type of the target verification file, and the second field name in the target verification file that verifies the field value data in the first field name.
  • the second The field value data in the field name is used to verify the field value data in the first field name;
  • the second obtaining unit 650 is used to obtain the field in the first field name according to the first field name Value data, and obtain the target verification file according to the data source information of the target verification file and the file identification of the target verification file;
  • the verification unit 660 is configured to obtain the target verification file based on the first The field value data in the second field name is verified against the field value data in the first field name.
  • the first execution unit includes: a recognition sub-unit for performing OCR character recognition on the image file to obtain recognized text information; and an execution sub-unit for obtaining recognized text information based on the recognized text information.
  • the field name determines the file type of the image file.
  • the data verification device further includes: a display unit, configured to obtain a comparison of the field value data in the first field name based on the field value data in the second field name in the target verification file The verification result of data verification, the verification result is displayed.
  • the data verification device further includes: a third acquiring unit, configured to acquire training set sample data used for training the first machine learning model to be trained, each of the training set sample data
  • the piece of sample data includes the business type, the file type of the image file, and the name of the first field in the image file that needs to be verified; the training unit is used to train the first machine learning model to be trained through the training set sample data to obtain The first machine learning model after training.
  • the data verification device further includes: a fourth acquiring unit configured to acquire test set sample data used to verify the trained first machine learning model, and the test set sample data
  • Each piece of sample data includes the business type, the file type of the image file, and the name of the first field in the image file that needs to be verified;
  • the fourth execution unit is used to convert the business type of each sample data of the test set sample data ,
  • the file type of the image file is input to the first machine learning model after training, and the first field name that needs to be verified in the predicted image file is output;
  • the detection unit is used to determine if the image file in the test set sample data
  • the proportion of the number of sample data items whose first field name needs to be verified and the first field name needed to be verified in the predicted image file are the same in the total number of sample data items in the test set sample data exceeds a predetermined ratio Threshold, the trained first machine learning model is identified as the pre-trained first machine learning model.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
  • a non-volatile storage medium which can be a CD-ROM, U disk, mobile hard disk, etc.
  • Including several instructions to make a computing device which may be a personal computer, a server, a mobile terminal, or a network device, etc.
  • an electronic device capable of implementing the above method is also provided.
  • FIG. 7 is an exemplary block diagram of an electronic device for implementing the foregoing data verification method according to an exemplary embodiment of the application.
  • the electronic device 700 shown in FIG. 7 is only an example, and should not bring any limitation to the functions and scope of use of the embodiments of the present application.
  • the electronic device 700 is represented in the form of a general-purpose computing device.
  • the components of the electronic device 700 may include, but are not limited to: the aforementioned at least one processing unit 710, the aforementioned at least one storage unit 720, and a bus 730 connecting different system components (including the storage unit 720 and the processing unit 710).
  • the storage unit stores program code, and the program code can be executed by the processing unit 710, so that the processing unit 710 executes the various exemplary methods described in the “Exemplary Method” section of this specification. Steps of implementation.
  • the processing unit 710 may perform the following steps:
  • the file type of the target verification file, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, wherein the target verification file is a file for verifying the image file
  • the target verification file is a file for verifying the image file
  • the pre-trained first machine learning model is obtained by training the sample data containing the service type, the file type of the image file, and the first field name in the image file that needs to be verified; input the file type of the image file, the service type , The file type of the target verification file and the first field name to the pre-trained second machine learning model, and output the data of the field value in the first field name in the target verification
  • the pre-trained second machine learning model collates the field value data in the first field name by including the business type, the file type of the target verification file, and the target verification file.
  • the sample data of the second field name of the verification is obtained through training, and the field value data in the second field name is used to verify the field value data in the first field name; the obtained data is obtained according to the first field name.
  • the field value data in the second field name is verified against the field value data in the first field name.
  • the storage unit 720 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 7201 and/or a cache storage unit 7202, and may further include a read-only storage unit (ROM) 7203.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 720 may also include a program/utility tool 7204 having a set of (at least one) program module 7205.
  • program module 7205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
  • the bus 730 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the electronic device 700 may also communicate with one or more external devices 900 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 700, and/or communicate with Any device (eg, router, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. Such communication may be performed through an input/output (I/O) interface 740.
  • the electronic device 700 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 760.
  • networks for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
  • the network adapter 760 communicates with other modules of the electronic device 700 through the bus 730. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • a computer-readable storage medium is also provided.
  • the computer-readable storage medium may be volatile or non-volatile, and the computer-readable storage medium may be The program product of the method.
  • the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
  • the file type of the target verification file, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, wherein the target verification file is a file for verifying the image file
  • the target verification file is a file for verifying the image file
  • the pre-trained first machine learning model is obtained by training the sample data containing the service type, the file type of the image file, and the first field name in the image file that needs to be verified; input the file type of the image file, the service type , The file type of the target verification file and the first field name to the pre-trained second machine learning model, and output the data of the field value in the first field name in the target verification
  • the pre-trained second machine learning model collates the field value data in the first field name by including the business type, the file type of the target verification file, and the target verification file.
  • the sample data of the second field name of the verification is obtained through training, and the field value data in the second field name is used to verify the field value data in the first field name; the obtained data is obtained according to the first field name.
  • the field value data in the second field name is verified against the field value data in the first field name.
  • various aspects of the present application can also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.
  • FIG. 8 is a computer-readable storage medium for implementing the above-mentioned data verification method according to an exemplary embodiment of the present application.
  • FIG. 8 depicts a program product 800 for implementing the above method according to an embodiment of the present application, which may adopt a portable compact disk read-only memory (CD-ROM) and include program code, and may be installed on an electronic device, such as a personal computer run.
  • CD-ROM portable compact disk read-only memory
  • the program product of this application is not limited to this.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can use any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code used to perform the operations of the present application can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers for example, using Internet service providers.
  • all the above-mentioned data can also be stored in a node of a blockchain.
  • image files, the first field name and the second field name, etc., these data can be stored in the blockchain node.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Abstract

Disclosed in the present application are a data checking method and apparatus, an electronic device, and a storage medium, wherein same relate to the technical field of data processing. The data checking method comprises: acquiring the service type of a target service, and an image file, which needs to be checked, in the target service; and determining the file type of the image file according to the image file, determining, according to the service type and the file type of the image file, the file type of a target checking file that needs to perform checking, and determining a data source identifier of the target checking file and a file identifier of the target checking file according to the image file, wherein the target checking file is a file that checks the image file. According to the technical solution provided in the present application, image files can be quickly and accurately checked.

Description

数据校验方法、装置、电子设备和存储介质Data verification method, device, electronic equipment and storage medium
本申请要求于2020年4月1日提交中国专利局、申请号为CN202010249650.5,发明名称为“数据校验方法、装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 1, 2020, the application number is CN202010249650.5, and the invention title is "data verification method, device, electronic equipment and storage medium", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及数据处理技术领域,特别是涉及基于数据校验方法、装置、电子设备和存储介质。This application relates to the field of data processing technology, in particular to data-based verification methods, devices, electronic equipment and storage media.
背景技术Background technique
目前,在某项业务的自助办理过程中,为了提高业务办理的效率,可以只由用户上传与所办理的业务相关的图像文件,系统会根据真实的校验文件对用户上传的图像文件进行核查,以保证所上传的图像文件真实有效,进而便于业务的顺利办理。At present, in the self-service processing of a certain business, in order to improve the efficiency of business processing, users can only upload image files related to the business handled, and the system will verify the image files uploaded by the user based on the real verification file. , In order to ensure that the uploaded image files are true and effective, and thus facilitate the smooth processing of the business.
发明人意识到随着业务类型的增加以及每种业务所需要校验的图像文件数量的增加,现有技术中缺乏一种快速且准确地对各个图像文件进行校验的机制。The inventor realized that as the types of services increase and the number of image files that need to be verified for each type of service increases, the prior art lacks a mechanism for quickly and accurately verifying each image file.
发明内容Summary of the invention
一种数据校验方法,包括:获取目标业务的业务类型以及目标业务需要进行校验的图像文件;根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。A data verification method includes: obtaining the business type of the target business and the image files that need to be verified by the target business; determining the file type of the image file according to the image file, and according to the business type and the image The file type of the file determines the file type of the target verification file that needs to be verified, and the data source identifier of the target verification file and the file identifier of the target verification file are determined according to the image file, wherein the target verification file It is a file for verifying image files; input the business type and the file type of the image file into the pre-trained first machine learning model, and output that the image file corresponding to the file type needs to be verified The first field name of the pre-trained first machine learning model is obtained by training sample data including the business type, the file type of the image file, and the first field name in the image file that needs to be verified; input the image The file type of the file, the service type, the file type of the target verification file, and the first field name to the pre-trained second machine learning model, the output is obtained in the target verification file for the The second field name for verifying the field value data in the first field name. The pre-trained second machine learning model uses the business type, the file type of the target verification file, and the target verification file to verify the first The field value data in the field name is obtained by training the sample data of the second field name for verifying the field value data, and the field value data in the second field name is used to verify the field value data in the first field name; Acquiring the field value data in the first field name according to the first field name, and acquiring the target verification file according to the data source information of the target verification file and the file identifier of the target verification file; The field value data in the first field name is verified based on the field value data in the second field name in the target verification file.
一种数据校验装置,包括:第一获取单元,用于获取目标业务的业务类型以及目标业务需要进行校验的图像文件;第一执行单元,用于根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;第二执行单元,用于输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;第三执行单元,用于输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习 模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;第二获取单元,用于根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;校验单元,用于基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。A data verification device includes: a first acquisition unit, used to acquire the business type of a target business and the image file that the target business needs to be verified; a first execution unit, used to determine the image file according to the image file The file type of the target verification file to be verified is determined according to the service type and the file type of the image file, and the data source identification and target verification of the target verification file are determined according to the image file. The file identification of the verification file, wherein the target verification file is a file for verifying the image file; the second execution unit is used to input the service type and the file type of the image file to the pre-trained first In the machine learning model, the first field name that needs to be verified in the image file corresponding to the file type is output, and the pre-trained first machine learning model includes the service type, the file type of the image file, and the image The sample data of the first field name in the file that needs to be verified is obtained through training; the third execution unit is used to input the file type of the image file, the service type, the file type of the target verification file, and the From the first field name to the second pre-trained machine learning model, output the second field name obtained in the target verification file for verifying the field value data in the first field name, and the pre-training The second machine learning model is obtained by training the sample data containing the business type, the file type of the target verification file, and the second field name in the target verification file that verifies the field value data in the first field name, so The field value data in the second field name is used to verify the field value data in the first field name; the second obtaining unit is used to obtain the field value data in the first field name according to the first field name. And obtain the target verification file according to the data source information of the target verification file and the file identifier of the target verification file; the verification unit is used to obtain the target verification file based on the data in the target verification file The field value data in the second field name is verified against the field value data in the first field name.
一种电子设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:An electronic device includes a memory and a processor, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes the following steps:
获取目标业务的业务类型以及目标业务需要进行校验的图像文件;根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。Obtain the service type of the target service and the image file that needs to be verified for the target service; determine the file type of the image file according to the image file, and determine the need for verification according to the service type and the file type of the image file The file type of the target verification file, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, wherein the target verification file is a file for verifying the image file Input the service type and the file type of the image file to the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type, and the pre- The trained first machine learning model is obtained by training the sample data containing the service type, the file type of the image file, and the first field name in the image file that needs to be verified; input the file type of the image file, the service type , The file type of the target verification file and the first field name to the pre-trained second machine learning model, and output the data of the field value in the first field name in the target verification file The name of the second field to be verified. The pre-trained second machine learning model collates the field value data in the first field name by including the business type, the file type of the target verification file, and the target verification file. The sample data of the second field name of the verification is obtained through training, and the field value data in the second field name is used to verify the field value data in the first field name; the obtained data is obtained according to the first field name The field value data in the first field name, and obtain the target verification file according to the data source information of the target verification file and the file identifier of the target verification file; based on the data in the target verification file The field value data in the second field name is verified against the field value data in the first field name.
一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
获取目标业务的业务类型以及目标业务需要进行校验的图像文件;根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;基于所述目标校验文件中的第二字段名中 的字段值数据对所述第一字段名中的字段值数据进行校验。Obtain the service type of the target service and the image file that needs to be verified for the target service; determine the file type of the image file according to the image file, and determine the need for verification according to the service type and the file type of the image file The file type of the target verification file, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, wherein the target verification file is a file for verifying the image file Input the service type and the file type of the image file to the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type, and the pre- The trained first machine learning model is obtained by training the sample data containing the service type, the file type of the image file, and the first field name in the image file that needs to be verified; input the file type of the image file, the service type , The file type of the target verification file and the first field name to the pre-trained second machine learning model, and output the data of the field value in the first field name in the target verification file The name of the second field to be verified. The pre-trained second machine learning model collates the field value data in the first field name by including the business type, the file type of the target verification file, and the target verification file. The sample data of the second field name of the verification is obtained through training, and the field value data in the second field name is used to verify the field value data in the first field name; the obtained data is obtained according to the first field name The field value data in the first field name, and obtain the target verification file according to the data source information of the target verification file and the file identifier of the target verification file; based on the data in the target verification file The field value data in the second field name is verified against the field value data in the first field name.
本申请可以实现快速且准确地对各个图像文件进行校验。This application can quickly and accurately verify each image file.
附图说明Description of the drawings
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
图2为本申请一示例性实施例示出的数据校验方法的流程图。Fig. 2 is a flowchart of a data verification method shown in an exemplary embodiment of the application.
图3为本申请一示例性实施例示出的数据校验方法的步骤S220的具体流程图。FIG. 3 is a specific flowchart of step S220 of the data verification method shown in an exemplary embodiment of the application.
图4为本申请一示例性实施例示出的数据校验方法的流程图。Fig. 4 is a flowchart of a data verification method shown in an exemplary embodiment of the application.
图5为本申请一示例性实施例示出的数据校验方法的流程图。Fig. 5 is a flowchart of a data verification method shown in an exemplary embodiment of the application.
图6是本申请一示例性实施例示出的一种数据校验装置的框图。Fig. 6 is a block diagram of a data verification device shown in an exemplary embodiment of the present application.
图7是本申请一示例性实施例示出的一种用于实现上述数据校验方法的电子设备示例框图。Fig. 7 is an exemplary block diagram of an electronic device for implementing the foregoing data verification method according to an exemplary embodiment of the present application.
图8是本申请一示例性实施例示出的一种用于实现上述数据校验方法的计算机可读存储介质。Fig. 8 shows a computer-readable storage medium for implementing the above-mentioned data verification method according to an exemplary embodiment of the present application.
具体实施方式Detailed ways
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, the provision of these embodiments makes this application more comprehensive and complete, and fully conveys the concept of the example embodiments To those skilled in the art.
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。In addition, the described features, structures, or characteristics may be combined in one or more embodiments in any suitable manner. In the following description, many specific details are provided to give a sufficient understanding of the embodiments of the present application. However, those skilled in the art will realize that the technical solutions of the present application can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other cases, well-known methods, devices, implementations or operations are not shown or described in detail in order to avoid obscuring various aspects of the present application.
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。The block diagrams shown in the drawings are merely functional entities, and do not necessarily correspond to physically independent entities. That is, these functional entities can be implemented in the form of software, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices. entity.
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。The flowchart shown in the drawings is only an exemplary description, and does not necessarily include all contents and operations/steps, nor does it have to be performed in the described order. For example, some operations/steps can be decomposed, and some operations/steps can be combined or partially combined, so the actual execution order may be changed according to actual conditions.
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
如图1所示,系统架构可以包括客户端(如图1中所示智能手机101、平板电脑102和便携式计算机103中的一种或多种,当然也可以是台式计算机等等)、网络104和服务器105。网络104用以在客户端和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线通信链路、无线通信链路等等。As shown in FIG. 1, the system architecture may include a client (as shown in FIG. 1, one or more of the smart phone 101, the tablet computer 102, and the portable computer 103, of course, it may also be a desktop computer, etc.), a network 104 And server 105. The network 104 is a medium used to provide a communication link between the client and the server 105. The network 104 may include various connection types, such as wired communication links, wireless communication links, and so on.
应该理解,图1中的客户端、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的客户端、网络和服务器。比如服务器105可以是多个服务器组成的服务器集群等。用户可以使用客户端通过网络104与服务器105交互,以接收或发送消息等,服务器105可以是提供各种服务的服务器,如提供数据校验服务的服务器。It should be understood that the numbers of clients, networks, and servers in FIG. 1 are merely illustrative. There can be any number of clients, networks, and servers according to implementation needs. For example, the server 105 may be a server cluster composed of multiple servers. The user can use the client to interact with the server 105 through the network 104 to receive or send messages, etc. The server 105 can be a server that provides various services, such as a server that provides a data verification service.
以执行主体为客户端为例,客户端获取目标业务的业务类型以及目标业务需要进行校 验的图像文件;根据图像文件确定图像文件的文件类型,并根据业务类型以及图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,目标校验文件是对图像文件进行校验的文件;输入业务类型以及图像文件的文件类型至预训练的第一机器学习模型中,输出得到在文件类型对应的图像文件中需要进行校验的第一字段名,预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;输入图像文件的文件类型、业务类型、目标校验文件的文件类型以及第一字段名至预训练的第二机器学习模型中,输出得到在目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名,预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,第二字段名中的字段值数据用于对第一字段名中的字段值数据进行校验,并根据第一字段名获取第一字段名中的字段值数据,并根据目标校验文件的数据源信息和目标校验文件的文件标识获取目标校验文件;基于目标校验文件中的第二字段名中的字段值数据对第一字段名中的字段值数据进行校验。Taking the execution subject as the client as an example, the client obtains the business type of the target business and the image files that need to be verified by the target business; determines the file type of the image file according to the image file, and determines the need according to the business type and the file type of the image file The file type of the target verification file to be verified, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, where the target verification file is the file for verifying the image file; Input the business type and the file type of the image file to the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type. The pre-trained first machine learning model contains There are business types, image file file types, and image files that need to be verified by the sample data training of the first field name; the file type of the input image file, the business type, the file type of the target verification file, and the first field name In the pre-trained second machine learning model, the output is obtained in the target verification file to verify the field value data in the first field name. The pre-trained second machine learning model contains the business The type, the file type of the target verification file and the sample data of the second field name in the target verification file that verify the field value data in the first field name are obtained by training, and the field value data in the second field name is used for Verify the field value data in the first field name, and obtain the field value data in the first field name according to the first field name, and obtain it according to the data source information of the target verification file and the file identification of the target verification file Target verification file; verify the field value data in the first field name based on the field value data in the second field name in the target verification file.
通过预训练的第一机器学习模型可以实现根据目标业务的业务类型以及需要进行校验的图像文件快速得确定每个图像文件中需要进行校验的第一字段名,进而可以避免对图像文件中不需要进行校验的其它字段名中的字段值数据进行校验;通过预训练的第二机器学习模型可以实现根据图像文件的文件类型、业务类型、目标校验文件的文件类型以及第一字段名确定目标校验文件中需要对第一字段名中的字段值数据进行校验的第二字段名,从而实现快速且准确地确定需要进行校验的校验文件以及校验文件中需要对第一字段名中的字段值数据进行有效检验的第二字段名中的字段值数据,在保证校验结果的准确性的同时,实现快速且准确地对各个图像文件进行校验;此外,在本方案中,即使针对业务类型以及多种图像文件的场景下,只需要对预训练的机器学习模型的训练数据进行调整,即可使得本方案能适应更复杂的业务场景,进而实现对目标业务中各个待校验的图像文件进行快速且准确地校验。Through the pre-trained first machine learning model, it is possible to quickly determine the first field name in each image file that needs to be verified according to the business type of the target business and the image file that needs to be verified. The field value data in other field names that do not need to be verified are verified; the second machine learning model can be used to verify the file type of the image file, the business type, the file type of the target file, and the first field through the pre-trained second machine learning model Determine the name of the second field in the target verification file that needs to be verified against the field value data in the first field name, so as to quickly and accurately determine the verification file that needs to be verified and the verification file that needs to be verified. The field value data in one field name is effectively checked for the field value data in the second field name. While ensuring the accuracy of the verification result, it realizes the rapid and accurate verification of each image file; in addition, in this In the solution, even in the scenario of business types and multiple image files, only the training data of the pre-trained machine learning model needs to be adjusted, so that the solution can adapt to more complex business scenarios, and then achieve the target business Each image file to be verified is quickly and accurately verified.
需要说明的是,本申请实施例所提供的数据校验方法一般由客户端执行,相应地,数据校验装置一般设置于客户端中。但是,在本申请的其它实施例中,服务器105也可以与客户端具有相似的功能,从而执行本申请实施例所提供的数据校验方法的方案。以下对本申请实施例的技术方案的实现细节进行详细阐述。It should be noted that the data verification method provided in the embodiments of the present application is generally executed by the client, and correspondingly, the data verification device is generally set in the client. However, in other embodiments of the present application, the server 105 may also have similar functions as the client, so as to execute the solution of the data verification method provided in the embodiments of the present application. The implementation details of the technical solutions of the embodiments of the present application will be described in detail below.
参考图2,图2为本申请一示例性实施例示出的数据校验方法的流程图,本实施例中的数据校验方法的执行主体为客户端,如图1所示的数据校验方法可包括如下步骤S210至步骤S260,详细说明如下。Referring to FIG. 2, FIG. 2 is a flowchart of a data verification method shown in an exemplary embodiment of this application. The execution subject of the data verification method in this embodiment is the client, as shown in FIG. 1 It may include the following steps S210 to S260, which are described in detail as follows.
在步骤S210中,获取目标业务的业务类型以及目标业务需要进行校验的图像文件。In step S210, the service type of the target service and the image file that needs to be verified for the target service are obtained.
在一个实施例中,目标业务指的是用户可以进行办理的某种具体业务,例如在贷款场景中,可以保单贷、房抵贷、个人住房贷等不同业务类型的业务。目标业务需要进行校验的图像文件作为办理该业务时由用户输入需要进行校验的图像文件。例如,在办理某项具体业务时,可以由用户通过客户端的业务办理页面提供的虚拟按钮输入所办理的业务类型以及需要进行校验的图像文件,可以理解的是,图像文件的个数可以为一个或多个,图像文件的个数可以根据办理业务的实际需求来确定。In one embodiment, the target business refers to a specific business that the user can handle. For example, in a loan scenario, different business types such as insurance policy loan, mortgage loan, and personal housing loan can be used. The image file that needs to be verified by the target business is used as the image file that needs to be verified when the user enters the business. For example, when handling a specific business, the user can enter the type of business handled and the image file that needs to be verified through the virtual button provided on the business handling page of the client. It is understandable that the number of image files can be One or more, the number of image files can be determined according to the actual needs of handling the business.
在步骤S220中,根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件。In step S220, the file type of the image file is determined according to the image file, the file type of the target verification file that needs to be verified is determined according to the service type and the file type of the image file, and the file type of the target verification file to be verified is determined according to the The image file determines the data source identification of the target verification file and the file identification of the target verification file, wherein the target verification file is a file for verifying the image file.
在一个实施例中,图像文件的文件类型是指对图像文件进行识别后所确定的文件类型。 在不用业务场景下,图像文件的文件类型存在差异,如在贷款场景中,图像文件的文件类型可以为身份证、保单、房产证、抵押合同等。在确定图像文件的文件类型时,可以基于图像文件中包含的字符数据来确定文件类型。In one embodiment, the file type of the image file refers to the file type determined after the image file is recognized. In a non-business scenario, the file types of image files are different. For example, in a loan scenario, the file types of image files can be ID cards, insurance policies, real estate certificates, mortgage contracts, etc. When determining the file type of the image file, the file type can be determined based on the character data contained in the image file.
参考图3,图3为本申请一示例性实施例示出的数据校验方法的步骤S220的具体流程图,步骤S220可以包括步骤S310至步骤S320,详细描述如下。Referring to FIG. 3, FIG. 3 is a specific flowchart of step S220 of the data verification method shown in an exemplary embodiment of the application. Step S220 may include step S310 to step S320, which are described in detail as follows.
步骤S310,对所述图像文件进行OCR字符识别得到识别的文本信息。Step S310: Perform OCR character recognition on the image file to obtain recognized text information.
在一个实施例中,在根据图像文件确定图像文件的文件类型时,可以先对图像文件进行OCR字符识别得到识别的文本信息,识别的文本信息指的是对图像文件中的所有字符数据进行识别后得到的字符数据集合。其中,字符数据集合包含图像文件中各个字段名对应的字符串以及各个字段名中的字段值数据对应的字符串。如在贷款场景中,若需要进行校验的图像文件为保单图像时,在对保单图像文件进行OCR字符识别得到识别后的字符数据集合中,字段名对应的字符串为“投保人姓名”、“承保金额”、“保险公司名称”、“保单号”等,而“投保人姓名”这个字段名中的字段值数据对应的字符串为“张三”,“承保金额”这个字段名中的字段值数据对应的字符串可以为“10000.00”,“保险公司名称”这个字段名中的字段值数据对应的字符串可以为“中国平安保险股份有限公司”,“保单号”这个字段名中的字段值数据对应的字符串可以为“5485426232”。In one embodiment, when the file type of the image file is determined according to the image file, the image file may be subjected to OCR character recognition to obtain the recognized text information. The recognized text information refers to the recognition of all character data in the image file The character data collection obtained afterwards. Among them, the character data set includes the character string corresponding to each field name in the image file and the character string corresponding to the field value data in each field name. For example, in a loan scenario, if the image file that needs to be verified is an insurance policy image, in the character data set after OCR character recognition is performed on the policy image file, the character string corresponding to the field name is "Insured Name", "Insurance amount", "Insurance company name", "Insurance policy number", etc., and the field value data in the field name of "Insurant name" corresponds to the string "Zhang San", and the value in the field name "Insurance amount" The string corresponding to the field value data can be "10000.00", the string corresponding to the field value data in the field name "Insurance Company Name" can be "Ping An Insurance Company of China", the string in the field name "Insurance No." The character string corresponding to the field value data can be "5485426232".
步骤S320,根据识别的文本信息中包含的关键字段名,确定所述图像文件的文件类型。Step S320: Determine the file type of the image file according to the key field name included in the recognized text information.
在一个实施例中,由于不同的文件类型中包含的字段名会存在差异,因此可以基于存在差异的关键字段名对图像文件进行分类,进而确定图像文件的文件类型。如在贷款场景中,针对某个图像文件,若在对图像文件进行识别后得到识别的文本信息中包含有“投保人姓名”、“保险公司名称”、“保单号”以及“投保类型”这四个关键字段名,则可以确定图像文件的文件类型为保单。需要指出的是,关键字段名一般为标识该图像文件中具有的特定字段名,该特定字段名可以为一个,也可以为多个,特定字段名的个数可以根据实际的分类情况来确定。In one embodiment, since the field names contained in different file types may have differences, the image files can be classified based on the key field names with differences, so as to determine the file type of the image file. For example, in a loan scenario, for a certain image file, if the recognized text information after recognizing the image file contains "name of applicant", "name of insurance company", "policy number" and "type of insurance". The four key field names can determine the file type of the image file as the insurance policy. It should be pointed out that the key field name generally identifies the specific field name in the image file. The specific field name can be one or multiple. The number of specific field names can be determined according to the actual classification situation. .
还请继续参考图2,在本一个实施例中,校验文件为对图像文件中包含的字符数据进行校验的文件,其中,校验文件的文件类型与图像文件的文件类型以及业务类型这两者之间存在映射关系。在获取得到图像文件的文件类型后,在确定对图像文件进行校验的校验文件的文件类型时,可以根据获取到的图像文件的文件类型,以及校验文件的文件类型与图像文件的文件类型以及业务类型这两者之间的映射关系,确定用于对图像文件进行校验的校验文件的文件类型。Please continue to refer to FIG. 2. In this embodiment, the verification file is a file for verifying the character data contained in the image file, where the file type of the verification file is the same as the file type and service type of the image file. There is a mapping relationship between the two. After obtaining the file type of the image file, when determining the file type of the verification file for verifying the image file, it can be based on the file type of the obtained image file, as well as the file type of the verification file and the file of the image file. The mapping relationship between the type and the service type determines the file type of the verification file used to verify the image file.
如在贷款场景的保单贷业务中,需要进行校验的图像文件包括用户输入的保单图像文件、身份证图像文件和贷款单图像文件。在确定对各个图像文件进行校验的校验文件时,针对保单贷业务中由用户输入的保单图像文件,可以根据映射关系确定需要通过保险公司真实的保单文件来对用户输入的保单图像文件进行校验;针对保单贷业务中由用户输入的身份证图像文件,根据映射关系可以确定需要通过调用公安部存储的身份证文件来对身份证图像文件进行校验;针对保单贷业务中由用户输入的贷款单图像文件,根据映射关系可以确定需要通过保险公司真实的保单文件中的部分字符数据来对贷款单图像文件进行校验。For example, in the insurance policy loan business in the loan scenario, the image files that need to be verified include the insurance policy image file, the ID card image file, and the loan note image file input by the user. When determining the verification file for verifying each image file, for the policy image file input by the user in the policy loan business, it can be determined according to the mapping relationship that the real policy file of the insurance company needs to be used to perform the verification on the policy image file input by the user. Verification; for the ID card image file input by the user in the insurance policy loan business, according to the mapping relationship, it can be determined that the ID card image file needs to be verified by calling the ID card file stored by the Ministry of Public Security; for the insurance policy loan business by the user input According to the mapping relationship, it can be determined that part of the character data in the real insurance policy file of the insurance company needs to be used to verify the loan note image file.
在一个实施例中,数据源标识具体可以为存储校验文件的外部数据服务器或本地数据服务器的标识信息,校验文件的文件标识作为对校验文件进行识别的唯一标识信息,如数据单号。为了获取进行校验的校验文件,还需要确定校验文件的数据源标识和校验文件的文件标识,具体的,可以根据图像文件中的字符数据确定校验文件的数据源标识和校验文件的文件标识。In one embodiment, the data source identification may specifically be the identification information of the external data server or the local data server storing the verification file, and the file identification of the verification file is used as the unique identification information for identifying the verification file, such as a data ticket number. . In order to obtain the verification file for verification, it is also necessary to determine the data source identification of the verification file and the file identification of the verification file. Specifically, the data source identification and verification of the verification file can be determined according to the character data in the image file. The file ID of the file.
如在贷款场景的保单贷业务中,针对保单贷业务中由用户输入的保单图像文件,由于对该保单图像文件进行校验的校验文件也为保单文件,为了获取进行校验的保单文件,可以对用户输入的保单图像文件进行OCR字符识别得到识别后的文本信息,其中,识别后的文本信息包括保单图像文件中的所有字符数据,并从识别后的文本信息中包含的“保险公司名称”这个字段名中字段值数据“中国平安保险股份有限公司”作为进行校验的保单文件的数据源标识,并识别后的文本信息“保单号”这个字段名中的字段值数据“5485426232”作为进行校验的保单文件的文件标识,进而便于根据保单文件的数据源标识和文件标识来获取进行校验的保单文件。For example, in the policy loan business of the loan scenario, for the policy image file input by the user in the policy loan business, since the verification file for verifying the policy image file is also the policy file, in order to obtain the policy file for verification, OCR character recognition can be performed on the policy image file input by the user to obtain the recognized text information, where the recognized text information includes all the character data in the policy image file, and the recognized text information contains the "insurance company name" The field value data "Ping An Insurance Company of China" in this field name is used as the data source identification of the policy document to be verified, and the field value data "5485426232" in the field name of the recognized text information "insurance policy number" is used as The document identification of the insurance policy document to be verified, thereby facilitating the acquisition of the insurance policy document to be verified according to the data source identification and the document identification of the insurance policy document.
在步骤S230中,输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到。In step S230, input the service type and the file type of the image file into the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type The pre-trained first machine learning model is obtained by training the sample data including the service type, the file type of the image file, and the name of the first field in the image file that needs to be verified.
在一个实施例中,第一字段名为图像文件中需要对该字段名中的字段值数据进行校验的字段名,需要指出的是,在办理不同的业务时,针对某个文件类型的图像文件,图像文件中需要进行校验的字段名会存在差异,图像文件需要进行校验的字段名与所办理的业务类型以及图像文件的文件类型两者之间存在关联关系。如在贷款场景的保单贷业务中,当办理的业务为保单贷业务且用户输入的图像文件为身份证图像文件时,则对身份证图像文件需要进行校验的第一字段名为“姓名”和“身份证号”,即仅仅需要对“姓名”和“身份证号”这两个第一字段名中的字段值数据进行校验。In one embodiment, the first field is the name of the field in the image file that needs to be verified for the field value data in the field name. It should be pointed out that when handling different services, the image file type There will be differences in the field names that need to be verified in files and image files. There is an association between the field names that need to be verified in image files, the type of business handled and the file type of the image file. For example, in the policy loan business in the loan scenario, when the business handled is the policy loan business and the image file input by the user is an ID card image file, the first field that needs to be verified on the ID card image file is named "name" And "ID number", that is, only the field value data in the two first field names of "name" and "ID number" need to be verified.
在办理某种业务类型的目标业务时,为了确定用户输入的各个图像文件中需要进行校验的第一字段名时,可以将需要办理的目标业务的业务类型以及用户输入的各个图像文件的文件类型输入至预训练的第一机器学习模型中,确定在用户输入的各个图像文件中需要进行校验的第一字段名。需要指出的是,需要进行校验的字段名可以为该图像文件包含的所有字段名,当然,也可以是该图像文件包含的部分字段名。When handling a certain type of target business, in order to determine the first field name that needs to be verified in each image file input by the user, the business type of the target business that needs to be processed and the file of each image file input by the user can be The type is input into the pre-trained first machine learning model, and the first field name that needs to be verified in each image file input by the user is determined. It should be pointed out that the field names that need to be verified can be all the field names contained in the image file, and of course, they can also be part of the field names contained in the image file.
参考图4,图4为本申请一示例性实施例示出的数据校验方法的流程图,可以包括步骤S410至步骤S420,详细描述如下。Referring to FIG. 4, FIG. 4 is a flowchart of a data verification method shown in an exemplary embodiment of this application, which may include steps S410 to S420, which are described in detail as follows.
在步骤S410中,获取用于对待训练的第一机器学习模型进行训练的训练集样本数据,所述训练集样本数据中的每条样本数据包括业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名。In step S410, the training set sample data used for training the first machine learning model to be trained is obtained, and each piece of sample data in the training set sample data includes the business type, the file type of the image file, and the image file. The name of the first field to be checked.
在一个实施例中,预训练的第一机器学习模型是通过训练样本数据对机器学习模型进行训练得到的。其中,第一机器学习模型可以是CNN(Convolutional Neural Network,卷积神经网络)模型,或者也可以是深度神经网络模型等。In one embodiment, the pre-trained first machine learning model is obtained by training the machine learning model through training sample data. Among them, the first machine learning model may be a CNN (Convolutional Neural Network, convolutional neural network) model, or may also be a deep neural network model.
第一机器学习模型具体训练过程如下:获取用于进行训练的训练集样本数据,训练集样本数据中的每一条样本数据均包括已有目标业务的业务类型、已有目标业务需要进行校验的各个图像文件的文件类型以及各个图像文件中需要进行校验的第一字段名。The specific training process of the first machine learning model is as follows: Obtain the training set sample data used for training. Each piece of sample data in the training set sample data includes the business type of the existing target business, and the existing target business needs to be verified. The file type of each image file and the name of the first field that needs to be verified in each image file.
在步骤S420中,通过所述训练集样本数据对待训练的第一机器学习模型进行训练,得到训练后的第一机器学习模型。In step S420, the first machine learning model to be trained is trained using the training set sample data to obtain the first machine learning model after training.
基于所获取的训练集样本数据对第一机器学习模型进行训练,得到训练后的第一机器学习模型。The first machine learning model is trained based on the acquired training set sample data to obtain the trained first machine learning model.
参考图5,图5为本申请一示例性实施例示出的数据校验方法的流程图,可以包括步骤S510至步骤S530,详细描述如下。Referring to FIG. 5, FIG. 5 is a flowchart of a data verification method shown in an exemplary embodiment of this application, which may include steps S510 to S530, which are described in detail as follows.
在步骤S510中,获取用于对训练后的第一机器学习模型进行校验的测试集样本数据,所述测试集样本数据中的每条样本数据包括业务类型、图像文件的文件类型以及图像文件 中需要进行校验的第一字段名。In step S510, obtain test set sample data used for verifying the trained first machine learning model, and each piece of sample data in the test set sample data includes the business type, the file type of the image file, and the image file The name of the first field to be checked in.
为了确定第一机器学习模型是否符合预期的需求,还可以通过测试样本数据对训练后的第一机器学习模型进行校验。具体的,可以获取测试集样本数据,测试集样本数据中的每一条样本数据也均包括已有目标业务的业务类型、已有目标业务需要进行校验的各个图像文件的文件类型以及各个图像文件中需要进行校验的第一字段名。In order to determine whether the first machine learning model meets the expected demand, the trained first machine learning model can also be verified through test sample data. Specifically, the test set sample data can be obtained. Each piece of sample data in the test set sample data also includes the business type of the existing target business, the file type of each image file that needs to be verified by the existing target business, and each image file The name of the first field to be checked in.
在步骤S520中,将所述测试集样本数据的每条样本数据的业务类型、图像文件的文件类型输入至训练后的第一机器学习模型,输出得到预测的图像文件中需要进行校验的第一字段名。In step S520, the service type of each sample data and the file type of the image file of the test set sample data are input to the first machine learning model after training, and the first machine learning model that needs to be verified is outputted from the predicted image file. A field name.
将业务类型、文件类型输入至训练好的第一机器学习模型中,得到预测的该文件类型下的图像文件中需要进行校验的字段名;判断训练集样本数据中的已知的该文件类型下的图像文件中需要进行校验的字段名与预测的该文件类型下的图像文件中需要进行校验的字段名是否一致对训练好的第一机器训练模型进行校验。Input the business type and file type into the trained first machine learning model, and get the predicted field name of the image file under the file type that needs to be verified; determine the known file type in the training set sample data Whether the field names that need to be verified in the image file under the file type are consistent with the predicted field names that need to be verified in the image file under the file type, the first machine training model that has been trained is verified.
在步骤S530中,若所述测试集样本数据中的图像文件中需要进行校验的第一字段名与预测的图像文件中需要进行校验的第一字段名都一致的样本数据条数占所述测试集样本数据中总样本数据条数的比例超过预定比例阈值,则将训练后的第一机器学习模型识别为所述预训练的第一机器学习模型。In step S530, if the first field name in the image file in the test set sample data that needs to be verified is the same as the first field name in the predicted image file that needs to be verified, the number of sample data pieces is all the same. If the proportion of the total number of sample data in the test set sample data exceeds a predetermined proportion threshold, the trained first machine learning model is identified as the pre-trained first machine learning model.
若训练集样本数据中的已知的该文件类型下的图像文件中需要进行校验的字段名与预测的该文件类型下的图像文件中需要进行校验的字段名全部一致的样本数据条数占训练集样本数据中样本数据条数的比例超过预定比例阈值,则说明校验通过,否则,则说明校验未通过,还需要继续对第一机器学习模型进行训练,直到校验通过。If in the training set sample data, it is known that the field names that need to be verified in the image files under this file type are the same as the predicted field names in the image files under this file type. The number of sample data pieces If the proportion of the number of sample data in the training set sample data exceeds the predetermined proportion threshold, the verification has passed, otherwise, the verification has not passed, and the first machine learning model needs to continue to be trained until the verification passes.
还请继续参考图2,在步骤S240中,输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验。Please also continue to refer to FIG. 2. In step S240, input the file type of the image file, the service type, the file type of the target verification file, and the first field name to the pre-trained second machine learning In the model, the second field name for verifying the field value data in the first field name in the target verification file is output, and the pre-trained second machine learning model contains business types, The file type of the target verification file and the sample data of the second field name in the target verification file for verifying the field value data in the first field name are obtained by training, and the field value data in the second field name is used for Verify the field value data in the first field name.
对于用户输入的某个图像文件中需要进行校验的第一字段名中的字段值数据,需要通过目标校验文件中的第二字段名中的字段值数据对其进行校验。For the field value data in the first field name that needs to be verified in a certain image file input by the user, it needs to be verified by the field value data in the second field name in the target verification file.
由于业务类型的不同、图像文件的文件类型的不同、目标校验文件的文件类型的不同或图像文件中的需要进行校验的字段值数据对应的第一字段名的不同,均会使得目标校验文件中需要进行校验的第二字段名均会存在差异。为了快速地确定目标校验文件中需要进行校验的第二字段名,可以输入图像文件的文件类型、业务类型、目标校验文件的文件类型以及第一字段名至预训练的第二机器学习模型中,输出得到在目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名,以便于通过第二字段名中的字段值数据对第一字段名中的字段值数据进行校验。Due to different business types, different file types of image files, different file types of target verification files, or different first field names corresponding to the field value data in the image files that need to be verified, the target verification will be caused. There will be differences in the name of the second field that needs to be verified in the verification file. In order to quickly determine the name of the second field that needs to be verified in the target verification file, you can enter the file type of the image file, the business type, the file type of the target verification file, and the first field name to the pre-trained second machine learning In the model, output the second field name that is used to verify the field value data in the first field name in the target verification file, so that the field value data in the second field name can be used to compare the field in the first field name. Value data is checked.
如在贷款场景的保单贷业务中,在对贷款单图像文件进行校验时,贷款单图像文件中需要进行校验的各个字段值数据对应的第一字段名包括“贷款人姓名”、“贷款人身份证”以及“贷款人手机号”。由于需要对贷款单图像文件进行校验的目标校验文件为保单,而在保单中用于对第一字段名中的字段值数据进行校验的第二字段名包括“投保人姓名”、“投保人身份证”以及“投保人手机号”。其中,“投保人姓名”这个第二字段名中的字段值数据用于对“贷款人姓名”这个第一字段名中的字段值数据进行校验,“投保人身份证”这个第二字段名中的字段值数据用于对“贷款人身份证”这个第一字段名中的字段值数据进行校验,“投保人手机号”这个第二字段名中的字段值数据用于对“贷款人手机号”这个第一字段名中的 字段值数据进行校验。For example, in the insurance policy loan business of the loan scenario, when verifying the image file of the loan note, the first field name corresponding to the value data of each field in the image file of the loan note that needs to be verified includes "lender name" and "loan" Personal ID” and “Lender’s mobile phone number”. Since the target verification file that needs to be verified on the loan note image file is the insurance policy, the second field name used to verify the field value data in the first field name in the insurance policy includes "insurant name", " "Insured's ID card" and "Insured's mobile phone number". Among them, the field value data in the second field name of "Insurant Name" is used to verify the field value data in the first field name of "Lender Name", and the second field name of "Insurant ID card" The field value data in the "Lender ID" is used to verify the field value data in the first field name, and the field value data in the second field name "Insured’s mobile phone number" The field value data in the first field name of "phone number" is checked.
第二机器学习模型可以是CNN(Convolutional Neural Network,卷积神经网络)模型,或者也可以是深度神经网络模型等。第二机器学习模型进行训练的样本数据包括业务类型、校验文件的文件类型以及校验数据中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,第二字段名中的字段值数据用于对第一字段名中的字段值数据进行校验,由于预训练的第二机器学习模型的训练过程与预训练的第一机器学习模型类似,故不赘述。The second machine learning model may be a CNN (Convolutional Neural Network, convolutional neural network) model or a deep neural network model. The sample data for training of the second machine learning model includes the business type, the file type of the verification file, and the sample data of the second field name that is used to verify the field value data in the first field name in the verification data. The field value data in the second field name is used to verify the field value data in the first field name. Since the training process of the pre-trained second machine learning model is similar to the pre-trained first machine learning model, we will not repeat it .
在步骤S250中,根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;In step S250, the field value data in the first field name is obtained according to the first field name, and the data source information of the target check file and the file identifier of the target check file are obtained. Target verification file;
在一个实施例中,在确定图像文件中需要进行校验的各个第一字段名,可以根据第一字段名图像文件中对应的字符数据中获取需要进行校验第一字段名中的字段值数据,作为进行校验的字段值数据。In one embodiment, in determining each first field name that needs to be verified in the image file, the field value data in the first field name that needs to be verified can be obtained according to the corresponding character data in the first field name image file , As the field value data for verification.
在获取目标校验文件的数据源信息和文件标识后,可以根据目标校验文件的数据源信息确定需要获取目标校验文件的目标服务器,并根据文件标识从存储目标校验文件的服务器获取所需的目标校验文件。After obtaining the data source information and file identifier of the target verification file, the target server that needs to obtain the target verification file can be determined according to the data source information of the target verification file, and the target server can be obtained from the server storing the target verification file according to the file identifier. The required target verification file.
在步骤S260中,基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。In step S260, the field value data in the first field name is verified based on the field value data in the second field name in the target verification file.
在一个实施例中,在获取图像文件中需要进行校验的第一字段名中的字段值数据以及目标校验文件中的第二字段名中的字段值数据后,会根据目标校验文件中的第二字段名中的字段值数据对图像文件中需要进行校验的第一字段名中的字段值数据进行校验。基于目标校验文件中的第二字段名中的字段值数据对图像文件中需要进行校验的第一字段名中的字段值数据进行校验,进而保证可以针对每一个图像文件都能进行准确地校验,提高了进行校验的准确率;此外,只针对图像文件中需要进行校验的第一字段名中的字段值数据进行校验,还可以避免针对图像文件中包含的所有字段名中的字段值数据都进行校验,提高了进行校验的效率。In one embodiment, after obtaining the field value data in the first field name that needs to be verified in the image file and the field value data in the second field name in the target verification file, it will be checked according to the target The field value data in the second field name of the image file is verified against the field value data in the first field name that needs to be verified. Based on the field value data in the second field name in the target verification file, verify the field value data in the first field name that needs to be verified in the image file to ensure that it can be accurate for each image file Local verification improves the accuracy of verification; in addition, verification is only performed on the field value data in the first field name that needs to be verified in the image file, and it can also avoid all field names contained in the image file. The field value data in are all verified, which improves the efficiency of verification.
以上可以看出,通过预训练的第一机器学习模型可以实现根据目标业务的业务类型以及需要进行校验的图像文件快速得确定每个图像文件中需要进行校验的第一字段名,进而可以避免对图像文件中不需要进行校验的其它字段名中的字段值数据进行校验;通过预训练的第二机器学习模型可以实现根据图像文件的文件类型、业务类型、目标校验文件的文件类型以及第一字段名确定目标校验文件中需要对第一字段名中的字段值数据进行校验的第二字段名,从而实现快速且准确地确定需要进行校验的校验文件以及校验文件中需要对第一字段名中的字段值数据进行有效检验的第二字段名中的字段值数据,在保证校验结果的准确性的同时,实现快速且准确地对各个图像文件进行校验;此外,在本方案中,即使针对业务类型以及多种图像文件的场景下,只需要对预训练的机器学习模型的训练数据进行调整,即可使得本方案能适应更复杂的业务场景,进而实现对目标业务中各个待校验的图像文件进行快速且准确地校验。It can be seen from the above that the pre-trained first machine learning model can quickly determine the name of the first field in each image file that needs to be verified according to the business type of the target business and the image file that needs to be verified, and then you can Avoid verifying the field value data in other field names in the image file that does not need to be verified; the second machine learning model can be used to verify the file according to the file type, business type, and target of the image file through the pre-trained second machine learning model The type and the first field name determine the second field name in the target verification file that needs to be verified for the field value data in the first field name, so as to quickly and accurately determine the verification file that needs to be verified and the verification The field value data in the second field name that needs to be effectively checked for the field value data in the first field name in the file, while ensuring the accuracy of the verification result, realizes the rapid and accurate verification of each image file In addition, in this solution, even in the context of business types and multiple image files, only the training data of the pre-trained machine learning model needs to be adjusted, so that the solution can adapt to more complex business scenarios, and then Achieve quick and accurate verification of each image file to be verified in the target business.
在一个实施例中,在步骤S250之后,还可以包括步骤:获取基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验的校验结果,展示所述校验结果。In one embodiment, after step S250, it may further include the step of: obtaining field value data in the second field name in the target verification file and verifying the field value data in the first field name. The verification result of, displays the verification result.
在对校验结果进行展示时,可以根据校验文件的文本类型和用户输入的图像文本的文件类型与展示文档模板的对应关系,将校验结果导入至对应的展示文档模板中,生成用于进行展示的展示文档,以方便更加直观地查看对应的校验结果。When the verification result is displayed, the verification result can be imported into the corresponding display document template according to the text type of the verification file and the corresponding relationship between the file type of the image text input by the user and the display document template to generate Display documents for display to facilitate and more intuitively view the corresponding verification results.
参考图6,图6是本申请一示例性实施例示出的一种数据校验装置的框图,所述数据 校验装置600可以集成于上述客户端中,具体可以包括第一获取单元610、第一执行单元620、第二执行单元630、第三执行单元640、第二获取单元650以及校验单元660。Referring to FIG. 6, FIG. 6 is a block diagram of a data verification device shown in an exemplary embodiment of the present application. The data verification device 600 may be integrated in the above-mentioned client, and may specifically include a first acquiring unit 610 and a second acquiring unit 610. An execution unit 620, a second execution unit 630, a third execution unit 640, a second acquisition unit 650, and a verification unit 660.
其中,第一获取单元610,用于获取目标业务的业务类型以及目标业务需要进行校验的图像文件;第一执行单元620,用于根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;第二执行单元630,用于输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;第三执行单元640,用于输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;第二获取单元650,用于根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;校验单元660,用于基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。Wherein, the first obtaining unit 610 is used to obtain the service type of the target service and the image file for which the target service needs to be verified; the first execution unit 620 is used to determine the file type of the image file according to the image file, and Determine the file type of the target verification file to be verified according to the service type and the file type of the image file, and determine the data source identifier of the target verification file and the file identifier of the target verification file according to the image file , Wherein the target verification file is a file for verifying an image file; the second execution unit 630 is configured to input the service type and the file type of the image file into the pre-trained first machine learning model , Output to obtain the first field name that needs to be verified in the image file corresponding to the file type, and the pre-trained first machine learning model contains the business type, the file type of the image file, and the image file that needs to be checked. The sample data of the first field name for verification is obtained through training; the third execution unit 640 is used to input the file type of the image file, the service type, the file type of the target verification file, and the first field Name to the second pre-trained machine learning model, output the second field name obtained in the target verification file for verifying the field value data in the first field name, and the pre-trained second The machine learning model is obtained by training the sample data containing the business type, the file type of the target verification file, and the second field name in the target verification file that verifies the field value data in the first field name. The second The field value data in the field name is used to verify the field value data in the first field name; the second obtaining unit 650 is used to obtain the field in the first field name according to the first field name Value data, and obtain the target verification file according to the data source information of the target verification file and the file identification of the target verification file; the verification unit 660 is configured to obtain the target verification file based on the first The field value data in the second field name is verified against the field value data in the first field name.
在一个实施例中,所述第一执行单元包括:识别子单元,用于对所述图像文件进行OCR字符识别得到识别的文本信息;执行子单元,用于根据识别的文本信息中包含的关键字段名,确定所述图像文件的文件类型。In one embodiment, the first execution unit includes: a recognition sub-unit for performing OCR character recognition on the image file to obtain recognized text information; and an execution sub-unit for obtaining recognized text information based on the recognized text information. The field name determines the file type of the image file.
在一个实施例中,所述数据校验装置还包括:展示单元,用于获取基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验的校验结果,展示所述校验结果。In an embodiment, the data verification device further includes: a display unit, configured to obtain a comparison of the field value data in the first field name based on the field value data in the second field name in the target verification file The verification result of data verification, the verification result is displayed.
在一个实施例中,所述数据校验装置还包括:第三获取单元,用于获取用于对待训练的第一机器学习模型进行训练的训练集样本数据,所述训练集样本数据中的每条样本数据包括业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名;训练单元,用于通过所述训练集样本数据对待训练的第一机器学习模型进行训练,得到训练后的第一机器学习模型。In an embodiment, the data verification device further includes: a third acquiring unit, configured to acquire training set sample data used for training the first machine learning model to be trained, each of the training set sample data The piece of sample data includes the business type, the file type of the image file, and the name of the first field in the image file that needs to be verified; the training unit is used to train the first machine learning model to be trained through the training set sample data to obtain The first machine learning model after training.
在一个实施例中,所述数据校验装置还包括:第四获取单元,用于获取用于对训练后的第一机器学习模型进行校验的测试集样本数据,所述测试集样本数据中的每条样本数据包括业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名;第四执行单元,用于将所述测试集样本数据的每条样本数据的业务类型、图像文件的文件类型输入至训练后的第一机器学习模型,输出得到预测的图像文件中需要进行校验的第一字段名;检测单元,用于若所述测试集样本数据中的图像文件中需要进行校验的第一字段名与预测的图像文件中需要进行校验的第一字段名都一致的样本数据条数占所述测试集样本数据中总样本数据条数的比例超过预定比例阈值,则将训练后的第一机器学习模型识别为所述预训练的第一机器学习模型。In an embodiment, the data verification device further includes: a fourth acquiring unit configured to acquire test set sample data used to verify the trained first machine learning model, and the test set sample data Each piece of sample data includes the business type, the file type of the image file, and the name of the first field in the image file that needs to be verified; the fourth execution unit is used to convert the business type of each sample data of the test set sample data , The file type of the image file is input to the first machine learning model after training, and the first field name that needs to be verified in the predicted image file is output; the detection unit is used to determine if the image file in the test set sample data The proportion of the number of sample data items whose first field name needs to be verified and the first field name needed to be verified in the predicted image file are the same in the total number of sample data items in the test set sample data exceeds a predetermined ratio Threshold, the trained first machine learning model is identified as the pre-trained first machine learning model.
上述装置中各个模块的功能和作用的实现过程具体详见上述基于数据校验方法中对应步骤的实现过程,在此不再赘述。For the implementation process of the functions and roles of each module in the above-mentioned device, refer to the implementation process of the corresponding steps in the above-mentioned data-based verification method for details, which will not be repeated here.
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请公开的实施方式,上文描述的两个或更 多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments disclosed in the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
此外,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。In addition, although the various steps of the method in the present disclosure are described in a specific order in the drawings, this does not require or imply that these steps must be performed in the specific order, or that all the steps shown must be performed to achieve the desired result. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、移动终端、或者网络设备等)执行根据本公开实施方式的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
在本公开的示例性实施例中,还提供了一种能够实现上述方法的电子设备。In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
所属技术领域的技术人员能够理解,本申请的各个方面可以实现为系统、方法或程序产品。因此,本申请的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present application can be implemented as a system, a method, or a program product. Therefore, each aspect of the present application can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which can be collectively referred to herein as "Circuit", "Module" or "System".
参考图7,图7为本申请一示例性实施例示出的一种用于实现上述数据校验方法的电子设备示例框图。图7显示的电子设备700仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。Referring to FIG. 7, FIG. 7 is an exemplary block diagram of an electronic device for implementing the foregoing data verification method according to an exemplary embodiment of the application. The electronic device 700 shown in FIG. 7 is only an example, and should not bring any limitation to the functions and scope of use of the embodiments of the present application.
如图7所示,电子设备700以通用计算设备的形式表现。电子设备700的组件可以包括但不限于:上述至少一个处理单元710、上述至少一个存储单元720、连接不同系统组件(包括存储单元720和处理单元710)的总线730。As shown in FIG. 7, the electronic device 700 is represented in the form of a general-purpose computing device. The components of the electronic device 700 may include, but are not limited to: the aforementioned at least one processing unit 710, the aforementioned at least one storage unit 720, and a bus 730 connecting different system components (including the storage unit 720 and the processing unit 710).
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元710执行,使得所述处理单元710执行本说明书上述“示例性方法”部分中描述的根据本申请各种示例性实施方式的步骤。例如,所述处理单元710可以执行如下步骤:Wherein, the storage unit stores program code, and the program code can be executed by the processing unit 710, so that the processing unit 710 executes the various exemplary methods described in the “Exemplary Method” section of this specification. Steps of implementation. For example, the processing unit 710 may perform the following steps:
获取目标业务的业务类型以及目标业务需要进行校验的图像文件;根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。Obtain the service type of the target service and the image file that needs to be verified for the target service; determine the file type of the image file according to the image file, and determine the need for verification according to the service type and the file type of the image file The file type of the target verification file, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, wherein the target verification file is a file for verifying the image file Input the service type and the file type of the image file to the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type, and the pre- The trained first machine learning model is obtained by training the sample data containing the service type, the file type of the image file, and the first field name in the image file that needs to be verified; input the file type of the image file, the service type , The file type of the target verification file and the first field name to the pre-trained second machine learning model, and output the data of the field value in the first field name in the target verification file The name of the second field to be verified. The pre-trained second machine learning model collates the field value data in the first field name by including the business type, the file type of the target verification file, and the target verification file. The sample data of the second field name of the verification is obtained through training, and the field value data in the second field name is used to verify the field value data in the first field name; the obtained data is obtained according to the first field name The field value data in the first field name, and obtain the target verification file according to the data source information of the target verification file and the file identifier of the target verification file; based on the data in the target verification file The field value data in the second field name is verified against the field value data in the first field name.
存储单元720可以包括易失性存储单元形式的可读介质,例如随机存取存储单元 (RAM)7201和/或高速缓存存储单元7202,还可以进一步包括只读存储单元(ROM)7203。The storage unit 720 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 7201 and/or a cache storage unit 7202, and may further include a read-only storage unit (ROM) 7203.
存储单元720还可以包括具有一组(至少一个)程序模块7205的程序/实用工具7204,这样的程序模块7205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The storage unit 720 may also include a program/utility tool 7204 having a set of (at least one) program module 7205. Such program module 7205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
总线730可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。The bus 730 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
电子设备700也可以与一个或多个外部设备900(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备700交互的设备通信,和/或与使得该电子设备700能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口740进行。并且,电子设备700还可以通过网络适配器760与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器760通过总线730与电子设备700的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备700使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The electronic device 700 may also communicate with one or more external devices 900 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 700, and/or communicate with Any device (eg, router, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. Such communication may be performed through an input/output (I/O) interface 740. In addition, the electronic device 700 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 760. As shown in the figure, the network adapter 760 communicates with other modules of the electronic device 700 through the bus 730. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,计算机可读存储介质可以是易失性的,也可以是非易失性的,其上存储有能够实现本说明书上述方法的程序产品。计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:In the exemplary embodiment of the present disclosure, a computer-readable storage medium is also provided. The computer-readable storage medium may be volatile or non-volatile, and the computer-readable storage medium may be The program product of the method. The computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
获取目标业务的业务类型以及目标业务需要进行校验的图像文件;根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。Obtain the service type of the target service and the image file that needs to be verified for the target service; determine the file type of the image file according to the image file, and determine the need for verification according to the service type and the file type of the image file The file type of the target verification file, and the data source identification of the target verification file and the file identification of the target verification file are determined according to the image file, wherein the target verification file is a file for verifying the image file Input the service type and the file type of the image file to the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type, and the pre- The trained first machine learning model is obtained by training the sample data containing the service type, the file type of the image file, and the first field name in the image file that needs to be verified; input the file type of the image file, the service type , The file type of the target verification file and the first field name to the pre-trained second machine learning model, and output the data of the field value in the first field name in the target verification file The name of the second field to be verified. The pre-trained second machine learning model collates the field value data in the first field name by including the business type, the file type of the target verification file, and the target verification file. The sample data of the second field name of the verification is obtained through training, and the field value data in the second field name is used to verify the field value data in the first field name; the obtained data is obtained according to the first field name The field value data in the first field name, and obtain the target verification file according to the data source information of the target verification file and the file identifier of the target verification file; based on the data in the target verification file The field value data in the second field name is verified against the field value data in the first field name.
在一些可能的实施方式中,本申请的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备 执行本说明书上述“示例性方法”部分中描述的根据本申请各种示例性实施方式的步骤。In some possible implementation manners, various aspects of the present application can also be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.
参考图8所示,图8是本申请一示例性实施例示出的一种用于实现上述数据校验方法的计算机可读存储介质。图8描述了根据本申请的实施方式的用于实现上述方法的程序产品800,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在电子设备,例如个人电脑上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Referring to FIG. 8, FIG. 8 is a computer-readable storage medium for implementing the above-mentioned data verification method according to an exemplary embodiment of the present application. FIG. 8 depicts a program product 800 for implementing the above method according to an embodiment of the present application, which may adopt a portable compact disk read-only memory (CD-ROM) and include program code, and may be installed on an electronic device, such as a personal computer run. However, the program product of this application is not limited to this. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product can use any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。The program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。The program code used to perform the operations of the present application can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of a remote computing device, the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
在另一个实施例中,本申请所提供的数据校验方法,为进一步保证上述所有出现的数据的私密和安全性,上述所有数据还可以存储于一区块链的节点中。例如图像文件、第一字段名及第二字段名等,这些数据均可存储在区块链节点中。In another embodiment, in the data verification method provided by the present application, in order to further ensure the privacy and security of all the above-mentioned data, all the above-mentioned data can also be stored in a node of a blockchain. For example, image files, the first field name and the second field name, etc., these data can be stored in the blockchain node.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
此外,上述附图仅是根据本申请示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiments of the present application, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the invention disclosed herein. This application is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. . The description and the embodiments are only regarded as exemplary, and the true scope and spirit of the present disclosure are pointed out by the claims.

Claims (20)

  1. 一种数据校验方法,其中,包括:A data verification method, which includes:
    获取目标业务的业务类型以及目标业务需要进行校验的图像文件;Obtain the business type of the target business and the image files that need to be verified for the target business;
    根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;The file type of the image file is determined according to the image file, the file type of the target verification file that needs to be verified is determined according to the service type and the file type of the image file, and the target is determined according to the image file The data source identification of the verification file and the file identification of the target verification file, wherein the target verification file is a file for verifying an image file;
    输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;Input the service type and the file type of the image file into the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type. The pre-training The first machine learning model of is obtained by training sample data that includes the business type, the file type of the image file, and the first field name in the image file that needs to be verified;
    输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;Input the file type of the image file, the service type, the file type of the target verification file, and the first field name to the pre-trained second machine learning model, and the output is obtained in the target verification file The second field name for verifying the field value data in the first field name in the first field name, the pre-trained second machine learning model is passed including the business type, the file type of the target verification file, and the target verification file The sample data of the second field name for verifying the field value data in the first field name is obtained by training, and the field value data in the second field name is used to compare the field value data in the first field name Check
    根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;Acquiring the field value data in the first field name according to the first field name, and acquiring the target verification file according to the data source information of the target verification file and the file identifier of the target verification file;
    基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。The field value data in the first field name is verified based on the field value data in the second field name in the target verification file.
  2. 根据权利要求1所述的数据校验方法,其中,所述根据所述图像文件确定所述图像文件的文件类型,包括:The data verification method according to claim 1, wherein the determining the file type of the image file according to the image file comprises:
    对所述图像文件进行OCR字符识别得到识别的文本信息;OCR character recognition is performed on the image file to obtain recognized text information;
    根据识别的文本信息中包含的关键字段名,确定所述图像文件的文件类型。The file type of the image file is determined according to the key field name contained in the recognized text information.
  3. 根据权利要求1所述的数据校验方法,其中,在基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验之后,所述方法数据校验方法还包括:The data verification method according to claim 1, wherein after verifying the field value data in the first field name based on the field value data in the second field name in the target verification file, The method data verification method further includes:
    获取基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验的校验结果,展示所述校验结果。Obtain a verification result of verifying the field value data in the first field name based on the field value data in the second field name in the target verification file, and display the verification result.
  4. 根据权利要求1所述的数据校验方法,其中,所述数据校验方法还包括:The data verification method according to claim 1, wherein the data verification method further comprises:
    获取用于对待训练的第一机器学习模型进行训练的训练集样本数据,所述训练集样本数据中的每条样本数据包括业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名;Obtain the training set sample data used for training the first machine learning model to be trained. Each piece of sample data in the training set sample data includes the business type, the file type of the image file, and the first image file that needs to be verified. A field name;
    通过所述训练集样本数据对待训练的第一机器学习模型进行训练,得到训练后的第一机器学习模型。The first machine learning model to be trained is trained through the training set sample data to obtain the first machine learning model after training.
  5. 根据权利要求4所述的数据校验方法,其中,在通过所述训练集样本数据对待训练的第一机器学习模型进行训练,得到训练后的第一机器学习模型之后,所述方法数据校验方法还包括:The data verification method according to claim 4, wherein, after the first machine learning model to be trained is trained through the training set sample data to obtain the trained first machine learning model, the method data verification Methods also include:
    获取用于对训练后的第一机器学习模型进行校验的测试集样本数据,所述测试集样本数据中的每条样本数据包括业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名;Obtain the test set sample data used to verify the first machine learning model after training, each piece of sample data in the test set sample data includes the business type, the file type of the image file, and the image file that needs to be verified The first field name;
    将所述测试集样本数据的每条样本数据的业务类型、图像文件的文件类型输入至训练后的第一机器学习模型,输出得到预测的图像文件中需要进行校验的第一字段名;Input the business type of each sample data of the test set sample data and the file type of the image file into the first machine learning model after training, and output the name of the first field that needs to be verified in the predicted image file;
    若所述测试集样本数据中的图像文件中需要进行校验的第一字段名与预测的图像文件中需要进行校验的第一字段名都一致的样本数据条数占所述测试集样本数据中总样本数据条数的比例超过预定比例阈值,则将训练后的第一机器学习模型识别为所述预训练的第一机器学习模型。If the first field name in the image file that needs to be verified in the test set sample data is consistent with the first field name in the predicted image file that needs to be verified, the number of sample data pieces in the test set sample data If the proportion of the total number of sample data exceeds the predetermined proportion threshold, the trained first machine learning model is identified as the pre-trained first machine learning model.
  6. 根据权利要求2所述的数据校验方法,其中,所述识别的文本信息包括对图像文件中的所有字符数据进行识别后得到的字符数据集合。The data verification method according to claim 2, wherein the recognized text information includes a character data set obtained after recognizing all character data in the image file.
  7. 根据权利要求6所述的数据校验方法,其中,所述字符数据集合包括图像文件中各个字段名对应的字符串以及各个字段名中的字段值数据对应的字符串。7. The data verification method according to claim 6, wherein the character data set includes a character string corresponding to each field name in the image file and a character string corresponding to field value data in each field name.
  8. 一种数据校验装置,其中,包括:A data verification device, which includes:
    第一获取单元,用于获取目标业务的业务类型以及目标业务需要进行校验的图像文件;The first acquiring unit is used to acquire the business type of the target business and the image files that the target business needs to be verified;
    第一执行单元,用于根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;The first execution unit is configured to determine the file type of the image file according to the image file, and determine the file type of the target verification file that needs to be verified according to the service type and the file type of the image file, and Determining the data source identifier of the target verification file and the file identifier of the target verification file according to the image file, wherein the target verification file is a file for verifying the image file;
    第二执行单元,用于输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;The second execution unit is used to input the service type and the file type of the image file into the pre-trained first machine learning model, and output the first image file that needs to be verified in the image file corresponding to the file type. Field name, the pre-trained first machine learning model is obtained by training sample data that includes the service type, the file type of the image file, and the first field name in the image file that needs to be verified;
    第三执行单元,用于输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;The third execution unit is used to input the file type of the image file, the service type, the file type of the target verification file, and the first field name into the second pre-trained machine learning model, and output In the target verification file, the second field name for verifying the field value data in the first field name, and the pre-trained second machine learning model passes through the The file type and the sample data of the second field name for verifying the field value data in the first field name in the target verification file are obtained by training, and the field value data in the second field name is used to compare the first field name. The field value data in the field name is verified;
    第二获取单元,用于根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;The second obtaining unit is configured to obtain the field value data in the first field name according to the first field name, and obtain according to the data source information of the target check file and the file identifier of the target check file The target verification file;
    校验单元,用于基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。The verification unit is configured to verify the field value data in the first field name based on the field value data in the second field name in the target verification file.
  9. 一种电子设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:An electronic device includes a memory and a processor, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes the following steps:
    获取目标业务的业务类型以及目标业务需要进行校验的图像文件;Obtain the business type of the target business and the image files that need to be verified for the target business;
    根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;The file type of the image file is determined according to the image file, the file type of the target verification file that needs to be verified is determined according to the service type and the file type of the image file, and the target is determined according to the image file The data source identification of the verification file and the file identification of the target verification file, wherein the target verification file is a file for verifying an image file;
    输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;Input the service type and the file type of the image file into the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type. The pre-training The first machine learning model of is obtained by training sample data that includes the business type, the file type of the image file, and the first field name in the image file that needs to be verified;
    输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所 述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;Input the file type of the image file, the service type, the file type of the target verification file, and the first field name to the pre-trained second machine learning model, and the output is obtained in the target verification file The second field name for verifying the field value data in the first field name in the first field name, the pre-trained second machine learning model is passed including the business type, the file type of the target verification file, and the target verification file The sample data of the second field name for verifying the field value data in the first field name is obtained by training, and the field value data in the second field name is used to compare the field value data in the first field name Check
    根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;Acquiring the field value data in the first field name according to the first field name, and acquiring the target verification file according to the data source information of the target verification file and the file identifier of the target verification file;
    基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。The field value data in the first field name is verified based on the field value data in the second field name in the target verification file.
  10. 根据权利要求9所述的电子设备,其中,所述根据所述图像文件确定所述图像文件的文件类型,包括:The electronic device according to claim 9, wherein the determining the file type of the image file according to the image file comprises:
    对所述图像文件进行OCR字符识别得到识别的文本信息;OCR character recognition is performed on the image file to obtain recognized text information;
    根据识别的文本信息中包含的关键字段名,确定所述图像文件的文件类型。The file type of the image file is determined according to the key field name contained in the recognized text information.
  11. 根据权利要求9所述的电子设备,其中,在基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验之后,所述计算机可读指令被所述处理器执行时,所述处理器还执行如下步骤:The electronic device according to claim 9, wherein, after verifying the field value data in the first field name based on the field value data in the second field name in the target verification file, the When the computer-readable instructions are executed by the processor, the processor further executes the following steps:
    获取基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验的校验结果,展示所述校验结果。Obtain a verification result of verifying the field value data in the first field name based on the field value data in the second field name in the target verification file, and display the verification result.
  12. 根据权利要求9所述的电子设备,其中,所述计算机可读指令被所述处理器执行时,所述处理器还执行如下步骤:The electronic device according to claim 9, wherein when the computer-readable instructions are executed by the processor, the processor further executes the following steps:
    获取用于对待训练的第一机器学习模型进行训练的训练集样本数据,所述训练集样本数据中的每条样本数据包括业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名;Obtain the training set sample data used for training the first machine learning model to be trained. Each piece of sample data in the training set sample data includes the business type, the file type of the image file, and the first image file that needs to be verified. A field name;
    通过所述训练集样本数据对待训练的第一机器学习模型进行训练,得到训练后的第一机器学习模型。The first machine learning model to be trained is trained through the training set sample data to obtain the first machine learning model after training.
  13. 根据权利要求12所述的电子设备,其中,在通过所述训练集样本数据对待训练的第一机器学习模型进行训练,得到训练后的第一机器学习模型之后,所述计算机可读指令被所述处理器执行时,所述处理器还执行如下步骤:The electronic device according to claim 12, wherein, after the first machine learning model to be trained is trained through the training set sample data to obtain the trained first machine learning model, the computer-readable instructions are executed When the processor executes, the processor further executes the following steps:
    获取用于对训练后的第一机器学习模型进行校验的测试集样本数据,所述测试集样本数据中的每条样本数据包括业务、类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名;Obtain test set sample data used to verify the first machine learning model after training, each piece of sample data in the test set sample data includes business, type, file type of the image file, and image files that need to be calibrated The name of the first field of the test;
    将所述测试集样本数据的每条样本数据的业务类型、图像文件的文件类型输入至训练后的第一机器学习模型,输出得到预测的图像文件中需要进行校验的第一字段名;Input the business type of each sample data of the test set sample data and the file type of the image file into the first machine learning model after training, and output the name of the first field that needs to be verified in the predicted image file;
    若所述测试集样本数据中的图像文件中需要进行校验的第一字段名与预测的图像文件中需要进行校验的第一字段名都一致的样本数据条数占所述测试集样本数据中总样本数据条数的比例超过预定比例阈值,则将训练后的第一机器学习模型识别为所述预训练的第一机器学习模型。If the first field name in the image file that needs to be verified in the test set sample data is consistent with the first field name in the predicted image file that needs to be verified, the number of sample data pieces in the test set sample data If the proportion of the total number of sample data exceeds the predetermined proportion threshold, the trained first machine learning model is identified as the pre-trained first machine learning model.
  14. 根据权利要求10所述的电子设备,其中,所述识别的文本信息包括对图像文件中的所有字符数据进行识别后得到的字符数据集合。10. The electronic device according to claim 10, wherein the recognized text information includes a character data set obtained by recognizing all character data in the image file.
  15. 根据权利要求14所述的电子设备,其中,所述字符数据集合包括图像文件中各个字段名对应的字符串以及各个字段名中的字段值数据对应的字符串。14. The electronic device according to claim 14, wherein the character data set includes a character string corresponding to each field name in the image file and a character string corresponding to field value data in each field name.
  16. 一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理 器执行时,使得一个或多个处理器执行如下步骤:A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取目标业务的业务类型以及目标业务需要进行校验的图像文件;Obtain the business type of the target business and the image files that need to be verified for the target business;
    根据所述图像文件确定所述图像文件的文件类型,并根据所述业务类型以及所述图像文件的文件类型确定需要进行校验的目标校验文件的文件类型,以及根据所述图像文件确定目标校验文件的数据源标识和目标校验文件的文件标识,其中,所述目标校验文件是对图像文件进行校验的文件;The file type of the image file is determined according to the image file, the file type of the target verification file that needs to be verified is determined according to the service type and the file type of the image file, and the target is determined according to the image file The data source identification of the verification file and the file identification of the target verification file, wherein the target verification file is a file for verifying an image file;
    输入所述业务类型以及所述图像文件的文件类型至预训练的第一机器学习模型中,输出得到在所述文件类型对应的图像文件中需要进行校验的第一字段名,所述预训练的第一机器学习模型通过包含有业务类型、图像文件的文件类型以及图像文件中需要进行校验的第一字段名的样本数据训练得到;Input the service type and the file type of the image file into the pre-trained first machine learning model, and output the first field name that needs to be verified in the image file corresponding to the file type. The pre-training The first machine learning model of is obtained by training sample data that includes the business type, the file type of the image file, and the first field name in the image file that needs to be verified;
    输入所述图像文件的文件类型、所述业务类型、所述目标校验文件的文件类型以及所述第一字段名至预训练的第二机器学习模型中,输出得到在所述目标校验文件中对所述第一字段名中的字段值数据进行校验的第二字段名,所述预训练的第二机器学习模型通过包含有业务类型、目标校验文件的文件类型以及目标校验文件中对第一字段名中的字段值数据进行校验的第二字段名的样本数据训练得到,所述第二字段名中的字段值数据用于对所述第一字段名中的字段值数据进行校验;Input the file type of the image file, the service type, the file type of the target verification file, and the first field name to the pre-trained second machine learning model, and the output is obtained in the target verification file The second field name for verifying the field value data in the first field name in the first field name, the pre-trained second machine learning model is passed including the business type, the file type of the target verification file, and the target verification file The sample data of the second field name for verifying the field value data in the first field name is obtained by training, and the field value data in the second field name is used to compare the field value data in the first field name Check
    根据所述第一字段名获取所述第一字段名中的字段值数据,并根据所述目标校验文件的数据源信息和所述目标校验文件的文件标识获取所述目标校验文件;Acquiring the field value data in the first field name according to the first field name, and acquiring the target verification file according to the data source information of the target verification file and the file identifier of the target verification file;
    基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验。The field value data in the first field name is verified based on the field value data in the second field name in the target verification file.
  17. 根据权利要求16所述的存储介质,其中,所述根据所述图像文件确定所述图像文件的文件类型,包括:The storage medium according to claim 16, wherein the determining the file type of the image file according to the image file comprises:
    对所述图像文件进行OCR字符识别得到识别的文本信息;OCR character recognition is performed on the image file to obtain recognized text information;
    根据识别的文本信息中包含的关键字段名,确定所述图像文件的文件类型。The file type of the image file is determined according to the key field name contained in the recognized text information.
  18. 根据权利要求16所述的存储介质,其中,在基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验之后,所述计算机可读指令被一个或多个处理器执行时,所述一个或多个处理器还执行如下步骤:The storage medium according to claim 16, wherein, after verifying the field value data in the first field name based on the field value data in the second field name in the target verification file, the When the computer-readable instructions are executed by one or more processors, the one or more processors further execute the following steps:
    获取基于所述目标校验文件中的第二字段名中的字段值数据对所述第一字段名中的字段值数据进行校验的校验结果,展示所述校验结果。Obtain a verification result of verifying the field value data in the first field name based on the field value data in the second field name in the target verification file, and display the verification result.
  19. 根据权利要求17所述的存储介质,其中,所述识别的文本信息包括对图像文件中的所有字符数据进行识别后得到的字符数据集合。17. The storage medium according to claim 17, wherein the recognized text information includes a character data set obtained by recognizing all character data in the image file.
  20. 根据权利要求19所述的存储介质,其中,所述字符数据集合包括图像文件中各个字段名对应的字符串以及各个字段名中的字段值数据对应的字符串。18. The storage medium according to claim 19, wherein the character data set includes a character string corresponding to each field name in the image file and a character string corresponding to field value data in each field name.
PCT/CN2021/078082 2020-04-01 2021-02-26 Data checking method and apparatus, electronic device, and storage medium WO2021196935A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010249650.5 2020-04-01
CN202010249650.5A CN111598122B (en) 2020-04-01 2020-04-01 Data verification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021196935A1 true WO2021196935A1 (en) 2021-10-07

Family

ID=72183396

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/078082 WO2021196935A1 (en) 2020-04-01 2021-02-26 Data checking method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN111598122B (en)
WO (1) WO2021196935A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760533A (en) * 2022-05-17 2022-07-15 北京达佳互联信息技术有限公司 Check value storage method, frame data check device and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598122B (en) * 2020-04-01 2022-02-08 深圳壹账通智能科技有限公司 Data verification method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8233751B2 (en) * 2006-04-10 2012-07-31 Patel Nilesh V Method and system for simplified recordkeeping including transcription and voting based verification
CN106127659A (en) * 2016-08-26 2016-11-16 南威软件股份有限公司 A kind of community grid management system
CN108388831A (en) * 2018-01-10 2018-08-10 链家网(北京)科技有限公司 A kind of identification of spare part and finish message method and device
CN109034816A (en) * 2018-06-08 2018-12-18 平安科技(深圳)有限公司 User information verification method, device, computer equipment and storage medium
CN109815792A (en) * 2018-12-13 2019-05-28 平安普惠企业管理有限公司 Picture file recognition methods, device, computer equipment and storage medium
CN110751110A (en) * 2019-10-24 2020-02-04 泰康保险集团股份有限公司 Identity image information verification method, device, equipment and storage medium
CN111598122A (en) * 2020-04-01 2020-08-28 深圳壹账通智能科技有限公司 Data verification method and device, electronic equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL202028A (en) * 2009-11-10 2016-06-30 Icts Holding Company Ltd Product, apparatus and methods for computerized authentication of electronic documents
RU2641225C2 (en) * 2014-01-21 2018-01-16 Общество с ограниченной ответственностью "Аби Девелопмент" Method of detecting necessity of standard learning for verification of recognized text
CN107067044B (en) * 2017-05-31 2024-03-29 北京空间飞行器总体设计部 Financial reimbursement complete ticket intelligent auditing system
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
US10540579B2 (en) * 2018-05-18 2020-01-21 Sap Se Two-dimensional document processing
US10795752B2 (en) * 2018-06-07 2020-10-06 Accenture Global Solutions Limited Data validation
CN110619252B (en) * 2018-06-19 2022-11-04 百度在线网络技术(北京)有限公司 Method, device and equipment for identifying form data in picture and storage medium
US10452897B1 (en) * 2018-08-06 2019-10-22 Capital One Services, Llc System for verifying the identity of a user
CN110070081A (en) * 2019-03-13 2019-07-30 深圳壹账通智能科技有限公司 Automatic information input method, device, storage medium and electronic equipment
CN110288755B (en) * 2019-05-21 2023-05-23 平安银行股份有限公司 Invoice checking method based on text recognition, server and storage medium
CN110348975A (en) * 2019-05-24 2019-10-18 深圳壹账通智能科技有限公司 Customs declaration information calibration method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8233751B2 (en) * 2006-04-10 2012-07-31 Patel Nilesh V Method and system for simplified recordkeeping including transcription and voting based verification
CN106127659A (en) * 2016-08-26 2016-11-16 南威软件股份有限公司 A kind of community grid management system
CN108388831A (en) * 2018-01-10 2018-08-10 链家网(北京)科技有限公司 A kind of identification of spare part and finish message method and device
CN109034816A (en) * 2018-06-08 2018-12-18 平安科技(深圳)有限公司 User information verification method, device, computer equipment and storage medium
CN109815792A (en) * 2018-12-13 2019-05-28 平安普惠企业管理有限公司 Picture file recognition methods, device, computer equipment and storage medium
CN110751110A (en) * 2019-10-24 2020-02-04 泰康保险集团股份有限公司 Identity image information verification method, device, equipment and storage medium
CN111598122A (en) * 2020-04-01 2020-08-28 深圳壹账通智能科技有限公司 Data verification method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760533A (en) * 2022-05-17 2022-07-15 北京达佳互联信息技术有限公司 Check value storage method, frame data check device and electronic equipment
CN114760533B (en) * 2022-05-17 2024-04-09 北京达佳互联信息技术有限公司 Check value storage method, frame data check method, device and electronic equipment

Also Published As

Publication number Publication date
CN111598122A (en) 2020-08-28
CN111598122B (en) 2022-02-08

Similar Documents

Publication Publication Date Title
WO2021120677A1 (en) Warehousing model training method and device, computer device and storage medium
CN111210335B (en) User risk identification method and device and electronic equipment
WO2019200810A1 (en) User data authenticity analysis method and apparatus, storage medium and electronic device
WO2022174491A1 (en) Artificial intelligence-based method and apparatus for medical record quality control, computer device, and storage medium
CN111343162B (en) System secure login method, device, medium and electronic equipment
CN108921552B (en) Evidence verification method and device
EP4006909B1 (en) Method, apparatus and device for quality control and storage medium
WO2021196935A1 (en) Data checking method and apparatus, electronic device, and storage medium
CN112990294B (en) Training method and device of behavior discrimination model, electronic equipment and storage medium
WO2019056496A1 (en) Method for generating picture review probability interval and method for picture review determination
WO2020232902A1 (en) Abnormal object identification method and apparatus, computing device, and storage medium
WO2021174814A1 (en) Answer verification method and apparatus for crowdsourcing task, computer device, and storage medium
US20150178346A1 (en) Using biometric data to identify data consolidation issues
CN105354506B (en) The method and apparatus of hidden file
US11222143B2 (en) Certified information verification services
WO2021072864A1 (en) Text similarity acquisition method and apparatus, and electronic device and computer-readable storage medium
WO2020252925A1 (en) Method and apparatus for searching user feature group for optimized user feature, electronic device, and computer nonvolatile readable storage medium
WO2020252880A1 (en) Reverse turing verification method and apparatus, storage medium, and electronic device
CN111210109A (en) Method and device for predicting user risk based on associated user and electronic equipment
US11687574B2 (en) Record matching in a database system
WO2022105120A1 (en) Text detection method and apparatus from image, computer device and storage medium
CN115545753A (en) Partner prediction method based on Bayesian algorithm and related equipment
CN111859985B (en) AI customer service model test method and device, electronic equipment and storage medium
CN111369375A (en) Social relationship determination method, device, equipment and storage medium
CN115150196B (en) Ciphertext data-based anomaly detection method, device and equipment under normal distribution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21778922

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21778922

Country of ref document: EP

Kind code of ref document: A1