CN118115293A - Identity document verification method, device, equipment and storage medium thereof - Google Patents

Identity document verification method, device, equipment and storage medium thereof Download PDF

Info

Publication number
CN118115293A
CN118115293A CN202410323313.4A CN202410323313A CN118115293A CN 118115293 A CN118115293 A CN 118115293A CN 202410323313 A CN202410323313 A CN 202410323313A CN 118115293 A CN118115293 A CN 118115293A
Authority
CN
China
Prior art keywords
identity document
text
target
result
identity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410323313.4A
Other languages
Chinese (zh)
Inventor
郭喜亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Health Insurance Company of China Ltd
Original Assignee
Ping An Health Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Health Insurance Company of China Ltd filed Critical Ping An Health Insurance Company of China Ltd
Priority to CN202410323313.4A priority Critical patent/CN118115293A/en
Publication of CN118115293A publication Critical patent/CN118115293A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

The embodiment of the application belongs to the technical field of digital medical treatment, is applied to an identity document verification scene in a medical insurance claim settlement service, and relates to an identity document verification method, an identity document verification device, identity document verification equipment and a storage medium thereof, wherein the identity document verification method, the identity document verification device and the storage medium comprise the steps of obtaining sample images of different identity documents in batches and training out an identity document type identification model; acquiring a test image of an identity document to be identified; identifying the type of the identity document to be identified according to the identity document type identification model; calling a preset structured analysis template to carry out structured analysis on the content in the identity document; and judging whether the verification of the target text to be verified is successful or not. The effective identity information of the client is extracted by combining the identity document provided by the client rapidly, so that the target to-be-verified text filled by the client and the effective identity information are verified, the target to-be-verified text is identified, namely whether the client identity information recorded in the target claim settlement document is effective or not is judged, and the efficiency of medical insurance claim settlement is improved conveniently.

Description

Identity document verification method, device, equipment and storage medium thereof
Technical Field
The application relates to the technical field of digital medical treatment, and is applied to an identity document verification scene in a medical insurance claim settlement service, in particular to an identity document verification method, an identity document verification device, identity document verification equipment and a storage medium thereof.
Background
Along with the development of the computer industry and artificial intelligence and the coming of the big data age, the traditional medical mode is gradually converted into the digital medical mode. Because health risk business often has multiple kinds of risk types according to different kinds of diseases and individual differences of patients, medical claim settlement business is complicated.
In the links of insurance application, insurance verification, claim settlement and the like, according to personal insurance real name management methods, effective identity documents of an applicant, an insured person and a beneficiary need to be provided, wherein the effective identity documents comprise resident identity cards, temporary resident identity cards, military identity cards, armed police identity cards, household books, birth certificates, passports, port and Australian resident incoming and outgoing passage cards, taiwan resident incoming and outgoing passage cards, port and Australian resident residence cards, foreigner residence cards and other legal identity documents, the types of the related documents are various, if identification and analysis of each type of document are subjected to customized development, a great deal of manpower is required to be input, repeated or similar development work exists, the local similarity among the documents is not effectively utilized, and the identification model variety is various, so that the company identification selection dilemma is caused, and the claim settlement and verification efficiency is seriously influenced.
Disclosure of Invention
The embodiment of the application aims to provide an identity document verification method, an identity document verification device, identity document verification equipment and a storage medium thereof, so as to solve the problems that in the verification of an actual ticket claim settlement and verification scene, the identity document has various identification models, so that the company identification selection is dilemma, and the claim settlement and verification efficiency is seriously influenced.
In order to solve the technical problems, the embodiment of the application provides an identity document verification method, which adopts the following technical scheme:
A method of verifying an identity document comprising the steps of:
obtaining sample images of different identity documents in batches from a preset sample library;
Inputting the sample image into an identity document type recognition model to be trained, and performing model training to obtain a trained identity document type recognition model;
Acquiring a test image of an identity document to be identified;
Inputting the test image into the identity document type recognition model after training, and recognizing the type of the identity document to be recognized according to the identity document type recognition model after training;
Calling a preset structured analysis template based on the type of the identity document to be identified, and carrying out structured analysis on the content in the identity document to obtain a structured analysis result;
comparing the structural analysis result with a target text to be verified, wherein the target text to be verified is customer identity information recorded in a target claim document;
if the structural analysis result is recognized to completely contain the target text to be verified through comparison, the target text to be verified is verified successfully;
if the structural analysis result is not completely contained in the target text to be verified through comparison, the target text to be verified fails to be verified.
Further, the identity document type recognition model to be trained includes an analysis component, a first extraction component, a second extraction component and a clustering component, the sample image is input into the identity document type recognition model to be trained, model training is performed, and a step of obtaining a trained identity document type recognition model specifically includes:
Performing content analysis and structure analysis on the sample image by adopting the analysis component to obtain a text content analysis result and a structure characteristic analysis result;
According to the text content analysis result and the structural feature analysis result, adopting the first extraction component to extract text content and structural features of the sample image, and obtaining a text content extraction result and a structural feature extraction result;
Inputting the sample image, the text content extraction result and the structural feature extraction result to the second extraction component, and extracting entity information of a target type, wherein the target type comprises a certificate title type and a fixed field type;
According to the extracted entity information of the target type, the position layout information corresponding to the entity information of the target type and the clustering processing component, clustering is carried out on sample images of different identity documents obtained in batches, and a target clustering result is obtained;
and verifying the target clustering result according to a preset verification form, if the target clustering result passes the verification, completing training the identity document type recognition model to be trained, and obtaining the trained identity document type recognition model, wherein the preset verification form comprises a reference clustering result corresponding to the sample image.
Further, the analysis component includes a text detection sub-component, a position layout marking sub-component and a region dividing sub-component, and the steps of adopting the analysis component to perform content analysis and structure analysis on the sample image to obtain a text content analysis result and a structure feature analysis result specifically include:
detecting text contents contained in the sample image based on the text detection sub-component, and detecting position layout information of all the text contents in the sample image respectively to obtain a detection result;
Marking the position layout information of all text contents by adopting the position layout marking sub-component according to the detection result;
and according to the position layout information of all the text contents, the text contents in the same position area are sorted and divided by adopting the regional division sub-assembly, and regional division results are obtained.
Further, the identification model of identity document type to be trained further includes a slice segmentation component and a slice adjustment component, and before executing the steps of extracting text content and extracting structural features from the sample image by using the first extraction component according to the text content analysis result and the structural feature analysis result, the method further includes:
Based on the regional division result, calling the slice segmentation component to carry out segmentation processing on the sample image so as to obtain a sample image slice;
Inputting the sample image slices into the slice adjustment assembly, and carrying out text-direction forward adjustment on all the sample image slices based on the slice adjustment assembly to obtain all the sample image slices after forward adjustment.
Further, the first extraction component includes a text content extraction sub-component and a location information extraction sub-component, and the steps of performing text content extraction and structural feature extraction on the sample image by using the first extraction component according to the text content analysis result and the structural feature analysis result, to obtain a text content extraction result and a structural feature extraction result specifically include:
Respectively extracting text contents from all sample image slices subjected to forward adjustment through the text content extraction sub-assembly to obtain text content extraction results;
And extracting position layout information of all text contents in the sample image respectively according to the position information extraction sub-assembly to obtain a structural feature extraction result.
Further, the step of clustering the sample images of the different identity documents obtained in batch according to the extracted entity information of the target type, the position layout information corresponding to the entity information of the target type and the clustering component to obtain a target clustering result specifically includes:
the entity information of all the extracted certificate title types is identified, so that certificate titles corresponding to all the sample images and position layout information of the certificate titles corresponding to all the sample images are obtained;
According to the certificate titles corresponding to all the sample images and the clustering component, carrying out clustering treatment on all the sample images, and adding the sample images of the same certificate title into the same clustering cluster to obtain a preliminary clustering treatment result;
The method comprises the steps of obtaining fixed fields corresponding to all sample images and position layout information of the fixed fields corresponding to all sample images by identifying entity information of all extracted fixed field types;
And carrying out secondary clustering treatment on the primary clustering treatment result according to the position layout information of the fixed fields corresponding to all the sample images and the clustering assembly, and adding the sample images with the same position layout information of the fixed fields into the same clustering cluster to obtain the target clustering treatment result.
Further, the step of inputting the test image to the training-completed identity document type recognition model, and recognizing the type of the identity document to be recognized according to the training-completed identity document type recognition model specifically includes:
Performing content analysis and structure analysis on the test image by adopting the analysis component to obtain a text content analysis result and a structural feature analysis result;
according to the text content analysis result and the structural feature analysis result, adopting the first extraction component to extract text content and structural feature of the test image, and obtaining a text content extraction result and a structural feature extraction result;
Inputting the test image, the text content extraction result and the structural feature extraction result to the second extraction component, extracting the entity information of the target type, and extracting the position layout information corresponding to the entity information of the target type;
identifying a cluster to which the test image belongs according to the entity information of the target type and the position layout information corresponding to the entity information of the target type;
And identifying the type of the identity document to be identified based on the cluster to which the test image belongs.
In order to solve the technical problems, the embodiment of the application also provides an identity document verification device, which adopts the following technical scheme:
an identity document verification apparatus comprising:
the sample image acquisition module is used for acquiring sample images of different identity documents in batches from a preset sample library;
The identification model training module is used for inputting the sample image into an identification model of the identity document type to be trained, and carrying out model training to obtain a trained identification model of the identity document type;
the test image acquisition module is used for acquiring a test image of the identity document to be identified;
The identification model identification module is used for inputting the test image into the identity document type identification model after training, and identifying the type of the identity document to be identified according to the identity document type identification model after training;
The structural analysis module is used for calling a preset structural analysis template based on the type of the identity document to be identified, and carrying out structural analysis on the content in the identity document to obtain a structural analysis result;
The verification comparison module is used for comparing the structural analysis result with a target text to be verified, wherein the target text to be verified is customer identity information recorded in a target claim document;
The first condition judging module is used for recognizing that the structured analysis result completely contains the target text to be verified through comparison, and the target text to be verified is verified successfully;
And the second condition judging module is used for identifying that the target text to be verified fails to be verified if the structural analysis result does not completely contain the target text to be verified through comparison.
In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:
a computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the identity document verification method described above.
In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:
A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the steps of the identity document verification method as described above.
Compared with the prior art, the embodiment of the application has the following main beneficial effects:
According to the identity document verification method, sample images of different identity documents are obtained in batches from a preset sample library, and an identity document type recognition model is trained; acquiring a test image of an identity document to be identified; identifying the type of the identity document to be identified according to the identity document type identification model; calling a preset structured analysis template to carry out structured analysis on the content in the identity document; and comparing the structural analysis result with the target text to be verified, and judging whether the target text to be verified is verified successfully or not. The effective identity information of the client is extracted by combining the identity document provided by the client rapidly, so that the target to-be-verified text filled by the client and the effective identity information are verified, the target to-be-verified text is identified, namely whether the client identity information recorded in the target claim settlement document is effective or not is judged, and the efficiency of medical insurance claim settlement is improved conveniently.
Drawings
In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a method of verifying an identity document in accordance with the present application;
FIG. 3 is a flow chart of one embodiment of step 202 of FIG. 2;
FIG. 4 is a flow chart of one embodiment of step 301 shown in FIG. 3;
FIG. 5 is a flow chart illustrating one embodiment of step 304 of FIG. 3;
FIG. 6 is a flow chart illustrating one embodiment of step 204 of FIG. 2;
FIG. 7 is a schematic view of the structure of one embodiment of an identity document verification apparatus according to the present application;
FIG. 8 is a schematic structural view of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Mov i ng P i cture ExpertsGroup Aud i o Layer I I I, dynamic video expert compression standard audio plane 3), MP4 (Mov i ng P i ctureExperts Group Aud i o Layer I V, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the identity document verification method provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the identity document verification apparatus is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow chart of one embodiment of a method of verifying an identity document according to the present application is shown. The identity document verification method comprises the following steps:
Step 201, obtaining sample images of different identity documents in batches from a preset sample library.
In this embodiment, the preset sample library refers to a database specially used for storing identity document images in a medical insurance claim service, the different identity documents include resident identity cards, temporary resident identity cards, military identity cards, armed police identity cards, household books, birth certificates, passports, port and australian resident in-and-out passage, taiwan resident in-and-out passage, port and australian resident residence cards, foreigner residence documents, and the like, and the sample images include photographed images, printed images, copied images, and the like.
Sample images of different identity documents are obtained in batches from a preset sample library, so that the sample images of different identity documents obtained in batches can be combined conveniently, and an identification model suitable for identifying various identity documents can be trained.
And 202, inputting the sample image into an identity document type recognition model to be trained, and performing model training to obtain a trained identity document type recognition model.
In this embodiment, the identity document type recognition model to be trained includes an analysis component, a first extraction component, a second extraction component, and a clustering component.
With continued reference to FIG. 3, FIG. 3 is a flow chart of one embodiment of step 202 of FIG. 2, including:
Step 301, performing content analysis and structure analysis on the sample image by adopting the analysis component to obtain a text content analysis result and a structure characteristic analysis result;
In this embodiment, the analysis component is mainly used for detecting text content in a sample image and detecting position layout information of the text content in the sample image, and the analysis component can be used for text detection by using a segmentation method for a text detection model DBNET, so that the analysis component has the main advantages of simple structure, high detection speed, contribution to training an identity document type recognition model, and contribution to carrying out test image recognition according to the identity document type recognition model after training.
In this embodiment, the analysis component includes a text detection sub-component, a position layout marking sub-component, and a region dividing sub-component.
With continued reference to fig. 4, fig. 4 is a flow chart illustrating one embodiment of step 301 shown in fig. 3, including:
Step 401, detecting text content contained in the sample image based on the text detection subassembly, and detecting position layout information of all text content in the sample image respectively, so as to obtain a detection result;
Step 402, marking the position layout information of all text contents by using the position layout marking sub-component according to the detection result;
and step 403, according to the position layout information of all the text contents, adopting the region dividing sub-assembly to divide the text contents in the same position region in a sorting way, and obtaining a regional division result.
Specifically, the regional division sub-assembly is adopted to carry out arrangement division on the text contents in the same position region, a regional division result is obtained, and the reasonability of subsequent slicing division is ensured by carrying out regional division on the text contents in the same position region.
Step 302, according to the text content analysis result and the structural feature analysis result, performing text content extraction and structural feature extraction on the sample image by adopting the first extraction component to obtain a text content extraction result and a structural feature extraction result;
in this embodiment, the identification module to be trained further includes a slice segmentation component and a slice adjustment component.
In this embodiment, before the step of performing the text content extraction and the structural feature extraction on the sample image according to the text content analysis result and the structural feature analysis result by using the first extraction component to obtain a text content extraction result and a structural feature extraction result, the method further includes: based on the regional division result, calling the slice segmentation component to carry out segmentation processing on the sample image so as to obtain a sample image slice; inputting the sample image slices into the slice adjustment assembly, and carrying out text-direction forward adjustment on all the sample image slices based on the slice adjustment assembly to obtain all the sample image slices after forward adjustment.
Specifically, a convolutional neural network may be used to extract structural features from each sample image slice, where the structural features include structural information of text content in each sample image slice, and perform direction judgment on the detected text content in each sample image slice; and carrying out text direction forward adjustment on all the sample image slices according to the direction judgment result to obtain all the sample image slices subjected to forward adjustment.
In this embodiment, the first extraction component includes a text content extraction sub-component and a location information extraction sub-component.
In this embodiment, the step of performing text content extraction and structural feature extraction on the sample image by using the first extraction component according to the text content analysis result and the structural feature analysis result to obtain a text content extraction result and a structural feature extraction result specifically includes: respectively extracting text contents from all sample image slices subjected to forward adjustment through the text content extraction sub-assembly to obtain text content extraction results; and extracting position layout information of all text contents in the sample image respectively according to the position information extraction sub-assembly to obtain a structural feature extraction result.
And respectively extracting text contents and position layout information of all the text contents in the sample images by carrying out forward adjustment on all the sample image slices, so that the entity information of a target type can be conveniently extracted according to the text content extraction result and the structural feature extraction result.
Step 303, inputting the sample image, the text content extraction result and the structural feature extraction result to the second extraction component, and extracting entity information of a target type, wherein the target type comprises a certificate title type and a fixed field type;
Specifically, the second extraction component may be an entity information extraction component based on LayoutLMv3, layoutLMv combines a plurality OF modes OF information such as sample images, text content extraction results, structure feature extraction results, and the like, and splits the whole text into different types OF entities, such as a certificate title type, a fixed field type, a variable field type, and an irrelevant information type, where the certificate title type is used to extract a title OF a certificate, and is generally centered at the top OF the certificate, content such as a birth medical certificate, MED I CAL CERT I F I CATE OF B I RTH, a sub-page OF the birth medical certificate, a resident registration card, and the like, the fixed field type refers to a field fixed in different samples OF the same type OF certificate, the different samples OF the same type OF certificate refer to different formats OF the same identity certificate, such as a first-generation resident identity card and a second-generation resident identity card, the variable field type refers to a field that changes according to personal information, such as a man/woman, an identity card number value, home address information, and the like, and the irrelevant field type refers to a text that is carelessly ingested in a background where the image is located.
And combining information of multiple modes such as a sample image, a text content extraction result, a structural feature extraction result and the like through the multi-mode information extraction model LayoutLMv. Different types of entity information in the sample image are acquired, so that the subsequent training of the identity document type recognition model by combining the entity information of the target type (document title type and fixed field type) is facilitated.
Step 304, clustering is carried out on sample images of different identity documents obtained in batches according to the extracted entity information of the target type, the position layout information corresponding to the entity information of the target type and the clustering processing component, so as to obtain a target clustering result;
with continued reference to fig. 5, fig. 5 is a flow chart illustrating one embodiment of step 304 of fig. 3, including:
Step 501, obtaining the certificate titles corresponding to all the sample images and the position layout information of the certificate titles corresponding to all the sample images by identifying the entity information of all the extracted certificate title types;
Step 502, carrying out clustering processing on all the sample images according to the certificate titles and the clustering components corresponding to all the sample images, and adding the sample images of the same certificate title into the same clustering cluster to obtain a preliminary clustering result;
through preliminary clustering, sample images of all formats of the same identity document can be aggregated together to obtain a cluster corresponding to the same identity document.
Step 503, obtaining the fixed fields corresponding to all the sample images and the position layout information of the fixed fields corresponding to all the sample images by identifying the entity information of all the extracted fixed field types;
and step 504, performing secondary clustering processing on the primary clustering processing result according to the position layout information of the fixed fields corresponding to all the sample images and the clustering component, and adding the sample images with the same position layout information of the fixed fields into the same clustering cluster to obtain the target clustering processing result.
Sample images of different formats of the same identity document can be respectively aggregated together through secondary clustering processing, and clustering clusters respectively corresponding to the different formats of the same identity document are obtained.
Step 305, verifying the target clustering result according to a preset verification form, if the target clustering result passes the verification, training the identity document type recognition model to be trained is completed, and the trained identity document type recognition model is obtained, wherein the preset verification form contains a reference clustering result corresponding to the sample image.
Specifically, the target clustering processing result is verified through a preset verification form, if the target clustering processing result is verified, the training of the identity document type recognition model to be trained is completed, and the identity document type recognition model after the training is completed is obtained, namely the identity document type recognition model has corresponding recognition capability.
Step 203, a test image of the identity document to be identified is obtained.
In this embodiment, the test image of the identity document to be identified refers to an image of the identity document provided by the customer in the medical insurance claim service. It is possible to provide different types of identity documents due to the difference in the identity of the clients.
And 204, inputting the test image into the trained identity document type recognition model, and recognizing the type of the identity document to be recognized according to the trained identity document type recognition model.
With continued reference to FIG. 6, FIG. 6 is a flow chart of one embodiment of step 204 of FIG. 2, comprising:
step 601, performing content analysis and structure analysis on the test image by adopting the analysis component to obtain a text content analysis result and a structure characteristic analysis result;
In particular, the analysis components used in recognition are the same as those used in training, which obviously also includes the text detection sub-component, the position layout marking sub-component and the region dividing sub-component. Thus, the method is applicable to a variety of applications. The text detection sub-component can also be used for detecting text contents contained in the test image and detecting position layout information of all the text contents in the test image respectively to obtain a detection result; marking the position layout information of all text contents by adopting the position layout marking sub-component according to the detection result; and according to the position layout information of all the text contents, the text contents in the same position area are sorted and divided by adopting the regional division sub-assembly, and the regional division result of the test image is obtained.
Step 602, according to the text content analysis result and the structural feature analysis result, performing text content extraction and structural feature extraction on the test image by adopting the first extraction component to obtain a text content extraction result and a structural feature extraction result;
Obviously, before executing step 602, the slice segmentation component may be invoked to segment the test image based on the regional division result of the test image, so as to obtain a test image slice; inputting the test image slices into the slice adjustment assembly, carrying out forward adjustment on all the test image slices in the text direction based on the slice adjustment assembly to obtain all the test image slices subjected to forward adjustment, and respectively carrying out text content extraction on all the test image slices subjected to forward adjustment through the text content extraction sub-assembly to obtain text content extraction results when step 602 is executed; and extracting position layout information of all text contents in the test image respectively according to the position information extraction sub-assembly to obtain a structural feature extraction result.
Step 603, inputting the test image, the text content extraction result and the structural feature extraction result to the second extraction component, extracting the entity information of the target type, and extracting the position layout information corresponding to the entity information of the target type;
specifically, the analysis component, the first extraction component and the second extraction component in the identity document type recognition model after training are adopted to extract the entity information of the target type and the position layout information corresponding to the entity information of the target type, so that the type of the identity document to be recognized can be recognized by conveniently combining the cluster clusters obtained during training.
Step 604, identifying a cluster to which the test image belongs according to the entity information of the target type and the position layout information corresponding to the entity information of the target type;
And step 605, identifying the type of the identity document to be identified based on the cluster to which the test image belongs.
Step 205, calling a preset structured analysis template based on the type of the identity document to be identified, and carrying out structured analysis on the content in the identity document to obtain a structured analysis result.
In this embodiment, before executing the step of calling a preset structural analysis template based on the type of the identity document to be identified, and performing structural analysis on the content in the identity document to obtain a structural analysis result, the method further includes: obtaining structured analysis templates which are respectively set in advance according to different identity document types;
In this embodiment, the preset structural analysis template is specifically constructed according to the extracted entity information of the target type in the sample image and the position layout information corresponding to the entity information of the target type in the sample image.
Specifically, the structured analysis templates which are set in advance according to different identity document types respectively comprise different structured analysis templates which are set according to different identity documents respectively, and also comprise different structured analysis templates which are set according to different formats of the same identity document respectively.
In this embodiment, the step of calling a preset structural analysis template based on the type of the identity document to be identified, and performing structural analysis on the content in the identity document to obtain a structural analysis result specifically includes: and calling a corresponding structured analysis template according to the type of the identity document to be identified, and carrying out structured analysis on the content in the identity document to obtain a document title and a fixed field in the identity document to be identified and value data respectively corresponding to the fixed field.
In this embodiment, the step of obtaining the value data corresponding to the fixed fields includes: judging the text category of the value data, wherein the text category comprises a word category, a letter category, a numbering category and a date category; if the value data belongs to the character class or the letter class, acquiring the value data by adopting a character string extraction mode; if the value data belong to the number category, acquiring the value data by adopting a regular expression extraction mode; and if the value data belong to the date category, acquiring the value data by adopting a preset unified date format extraction mode.
And step 206, comparing the structured analysis result with a target text to be verified, wherein the target text to be verified is the customer identity information recorded in the target claim document.
Step 207, if the structural analysis result is identified to completely contain the target text to be verified through comparison, the verification of the target text to be verified is successful.
And step 208, if the structural analysis result is not completely contained in the target text to be verified through comparison, the verification of the target text to be verified fails.
According to the embodiment, a model training mode is adopted to obtain a sample image of a batch of identity document types from a database which is specially used for storing the identity document images in a medical insurance claim service, so that when the identity document types are recognized by clients in the medical insurance claim service, effective identity information of the clients is extracted by combining the identity document provided by the clients, verification is carried out on target to-be-verified text filled by the clients and the effective identity information, whether the target to-be-verified text is effective or not, namely, whether the client identity information recorded in the target claim document is effective or not is recognized, medical insurance claim is facilitated, and when new identity document types appear, the sample image of the new identity document types can be obtained independently, incremental supplementing training and analysis template construction are carried out, so that the identity document types are perfected, the integrated identity document types are used for replacing a plurality of complicated recognition models, and the difficulty in selecting the medical insurance claim is reduced.
According to the application, sample images of different identity documents are obtained in batches from a preset sample library, and an identity document type recognition model is trained; acquiring a test image of an identity document to be identified; identifying the type of the identity document to be identified according to the identity document type identification model; calling a preset structured analysis template to carry out structured analysis on the content in the identity document; and comparing the structural analysis result with the target text to be verified, and judging whether the target text to be verified is verified successfully or not. The effective identity information of the client is extracted by combining the identity document provided by the client rapidly, so that the target to-be-verified text filled by the client and the effective identity information are verified, the target to-be-verified text is identified, namely whether the client identity information recorded in the target claim settlement document is effective or not is judged, and the efficiency of medical insurance claim settlement is improved conveniently.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (ART I F I C I A L I NTE L L I GENCE, A I) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
In the embodiment of the application, the sample images of different identity documents are obtained in batches from a preset sample library, and an identity document type recognition model is trained; acquiring a test image of an identity document to be identified; identifying the type of the identity document to be identified according to the identity document type identification model; calling a preset structured analysis template to carry out structured analysis on the content in the identity document; and comparing the structural analysis result with the target text to be verified, and judging whether the target text to be verified is verified successfully or not. And the effective identity information of the client is extracted by combining the identity document provided by the client rapidly, so that the target text to be verified filled by the client and the effective identity information are verified, the target text to be verified is identified, namely whether the client identity information recorded in the target claim document is effective or not is judged, and medical insurance claim settlement is facilitated.
With further reference to fig. 7, as an implementation of the method shown in fig. 2 described above, the present application provides an embodiment of an identity document verification apparatus, which corresponds to the method embodiment shown in fig. 2, which is particularly applicable in various electronic devices.
As shown in fig. 7, the identity document verification apparatus 700 according to the present embodiment includes: a sample image acquisition module 701, an identification model training module 702, a test image acquisition module 703, an identification model identification module 704, a structural parsing module 705, a verification comparison module 706, a first condition judgment module 707, and a second condition judgment module 708. Wherein:
The sample image obtaining module 701 is configured to obtain sample images of different identity documents in batches from a preset sample library;
the recognition model training module 702 is configured to input the sample image into an identity document type recognition model to be trained, perform model training, and obtain a trained identity document type recognition model;
A test image acquisition module 703, configured to acquire a test image of an identity document to be identified;
The recognition model recognition module 704 is configured to input the test image into the trained identity document type recognition model, and recognize the type of the identity document to be recognized according to the trained identity document type recognition model;
The structural analysis module 705 is configured to invoke a preset structural analysis template based on the type of the identity document to be identified, and perform structural analysis on the content in the identity document to obtain a structural analysis result;
The verification comparison module 706 is configured to compare the structured analysis result with a target text to be verified, where the target text to be verified is customer identity information recorded in a target claim document;
A first condition determining module 707, configured to, if the structural analysis result is identified by comparing that the structural analysis result completely includes the target text to be verified, successfully verify the target text to be verified;
And a second condition judgment module 708, configured to, if the structural analysis result is not completely contained in the target text to be verified through comparison, fail the verification of the target text to be verified.
According to the application, sample images of different identity documents are obtained in batches from a preset sample library, and an identity document type recognition model is trained; acquiring a test image of an identity document to be identified; identifying the type of the identity document to be identified according to the identity document type identification model; calling a preset structured analysis template to carry out structured analysis on the content in the identity document; and comparing the structural analysis result with the target text to be verified, and judging whether the target text to be verified is verified successfully or not. The effective identity information of the client is extracted by combining the identity document provided by the client rapidly, so that the target to-be-verified text filled by the client and the effective identity information are verified, the target to-be-verified text is identified, namely whether the client identity information recorded in the target claim settlement document is effective or not is judged, and the efficiency of medical insurance claim settlement is improved conveniently.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by computer readable instructions, stored on a computer readable storage medium, that the program when executed may comprise the steps of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-On-y Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 8, fig. 8 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 8 comprises a memory 8a, a processor 8b, a network interface 8c communicatively connected to each other via a system bus. It should be noted that only computer device 8 having components 8a-8c is shown in the figures, but it should be understood that not all of the illustrated components need be implemented, and that more or fewer components may alternatively be implemented. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit (APP L I CAT I on SPEC I F I C I NTEGRATED CI rcu it, AS ic), a programmable gate array (Fi e l d-Programmab L E GATE ARRAY, FPGA), a digital Processor (DI G ITA L SI GNA L Processor, DSP), an embedded device, and the like.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 8a includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 8a may be an internal storage unit of the computer device 8, such as a hard disk or a memory of the computer device 8. In other embodiments, the memory 8a may also be an external storage device of the computer device 8, such as a plug-in hard disk, a smart memory card (SMART MED I A CARD, SMC), a secure digital (Secure D i gi ta l, SD) card, a flash memory card (F L ASH CARD) or the like, which are provided on the computer device 8. Of course, the memory 8a may also comprise both an internal memory unit of the computer device 8 and an external memory device. In this embodiment, the memory 8a is typically used to store an operating system and various application software installed on the computer device 8, such as computer readable instructions for an identity document verification method. Further, the memory 8a may be used to temporarily store various types of data that have been output or are to be output.
The processor 8b may be a central processing unit (Centra l Process i ng Un i t, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 8b is typically used to control the overall operation of the computer device 8. In this embodiment, the processor 8b is configured to execute computer readable instructions stored in the memory 8a or process data, such as computer readable instructions for executing the identity document verification method.
The network interface 8c may comprise a wireless network interface or a wired network interface, which network interface 8c is typically used to establish a communication connection between the computer device 8 and other electronic devices.
The computer equipment provided by the embodiment belongs to the technical field of digital medical treatment, and is applied to an identity document verification scene in a medical insurance claim settlement service. According to the application, sample images of different identity documents are obtained in batches from a preset sample library, and an identity document type recognition model is trained; acquiring a test image of an identity document to be identified; identifying the type of the identity document to be identified according to the identity document type identification model; calling a preset structured analysis template to carry out structured analysis on the content in the identity document; and comparing the structural analysis result with the target text to be verified, and judging whether the target text to be verified is verified successfully or not. The effective identity information of the client is extracted by combining the identity document provided by the client rapidly, so that the target to-be-verified text filled by the client and the effective identity information are verified, the target to-be-verified text is identified, namely whether the client identity information recorded in the target claim settlement document is effective or not is judged, and the efficiency of medical insurance claim settlement is improved conveniently.
The present application also provides another embodiment, namely, a computer readable storage medium storing computer readable instructions executable by a processor to cause the processor to perform the steps of the identity document verification method as described above.
The computer readable storage medium provided by the embodiment belongs to the technical field of digital medical treatment, and is applied to an identity document verification scene in a medical insurance claim settlement service. According to the application, sample images of different identity documents are obtained in batches from a preset sample library, and an identity document type recognition model is trained; acquiring a test image of an identity document to be identified; identifying the type of the identity document to be identified according to the identity document type identification model; calling a preset structured analysis template to carry out structured analysis on the content in the identity document; and comparing the structural analysis result with the target text to be verified, and judging whether the target text to be verified is verified successfully or not. The effective identity information of the client is extracted by combining the identity document provided by the client rapidly, so that the target to-be-verified text filled by the client and the effective identity information are verified, the target to-be-verified text is identified, namely whether the client identity information recorded in the target claim settlement document is effective or not is judged, and the efficiency of medical insurance claim settlement is improved conveniently.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims (10)

1. A method of verifying an identity document, comprising the steps of:
obtaining sample images of different identity documents in batches from a preset sample library;
Inputting the sample image into an identity document type recognition model to be trained, and performing model training to obtain a trained identity document type recognition model;
Acquiring a test image of an identity document to be identified;
Inputting the test image into the identity document type recognition model after training, and recognizing the type of the identity document to be recognized according to the identity document type recognition model after training;
Calling a preset structured analysis template based on the type of the identity document to be identified, and carrying out structured analysis on the content in the identity document to obtain a structured analysis result;
comparing the structural analysis result with a target text to be verified, wherein the target text to be verified is customer identity information recorded in a target claim document;
if the structural analysis result is recognized to completely contain the target text to be verified through comparison, the target text to be verified is verified successfully;
if the structural analysis result is not completely contained in the target text to be verified through comparison, the target text to be verified fails to be verified.
2. The identity document verification method according to claim 1, wherein the identity document type recognition model to be trained comprises an analysis component, a first extraction component, a second extraction component and a clustering component, the sample image is input into the identity document type recognition model to be trained, model training is performed, and the step of obtaining a trained identity document type recognition model specifically comprises:
Performing content analysis and structure analysis on the sample image by adopting the analysis component to obtain a text content analysis result and a structure characteristic analysis result;
According to the text content analysis result and the structural feature analysis result, adopting the first extraction component to extract text content and structural features of the sample image, and obtaining a text content extraction result and a structural feature extraction result;
Inputting the sample image, the text content extraction result and the structural feature extraction result to the second extraction component, and extracting entity information of a target type, wherein the target type comprises a certificate title type and a fixed field type;
According to the extracted entity information of the target type, the position layout information corresponding to the entity information of the target type and the clustering processing component, clustering is carried out on sample images of different identity documents obtained in batches, and a target clustering result is obtained;
and verifying the target clustering result according to a preset verification form, if the target clustering result passes the verification, completing training the identity document type recognition model to be trained, and obtaining the trained identity document type recognition model, wherein the preset verification form comprises a reference clustering result corresponding to the sample image.
3. The identity document verification method according to claim 2, wherein the analysis component comprises a text detection sub-component, a position layout marking sub-component and a region dividing sub-component, and the step of performing content analysis and structure analysis on the sample image by using the analysis component to obtain text content analysis results and structure feature analysis results specifically comprises:
detecting text contents contained in the sample image based on the text detection sub-component, and detecting position layout information of all the text contents in the sample image respectively to obtain a detection result;
Marking the position layout information of all text contents by adopting the position layout marking sub-component according to the detection result;
and according to the position layout information of all the text contents, the text contents in the same position area are sorted and divided by adopting the regional division sub-assembly, and regional division results are obtained.
4. A method of verifying an identity document according to claim 3, wherein the identity document type recognition model to be trained further comprises a slice segmentation component and a slice adjustment component, and wherein, prior to performing the steps of extracting text content and structural features from the sample image using the first extraction component based on the text content analysis result and the structural feature analysis result, the method further comprises:
Based on the regional division result, calling the slice segmentation component to carry out segmentation processing on the sample image so as to obtain a sample image slice;
Inputting the sample image slices into the slice adjustment assembly, and carrying out text-direction forward adjustment on all the sample image slices based on the slice adjustment assembly to obtain all the sample image slices after forward adjustment.
5. The identity document verification method according to claim 4, wherein the first extraction component includes a text content extraction sub-component and a location information extraction sub-component, and the steps of performing text content extraction and structural feature extraction on the sample image by using the first extraction component according to the text content analysis result and the structural feature analysis result, and obtaining a text content extraction result and a structural feature extraction result specifically include:
Respectively extracting text contents from all sample image slices subjected to forward adjustment through the text content extraction sub-assembly to obtain text content extraction results;
And extracting position layout information of all text contents in the sample image respectively according to the position information extraction sub-assembly to obtain a structural feature extraction result.
6. The identity document verification method according to claim 2 or 5, wherein the step of clustering the sample images of the different identity documents obtained in batch according to the extracted entity information of the target type, the position layout information corresponding to the entity information of the target type, and the clustering component to obtain a target clustering result specifically includes:
the entity information of all the extracted certificate title types is identified, so that certificate titles corresponding to all the sample images and position layout information of the certificate titles corresponding to all the sample images are obtained;
According to the certificate titles corresponding to all the sample images and the clustering component, carrying out clustering treatment on all the sample images, and adding the sample images of the same certificate title into the same clustering cluster to obtain a preliminary clustering treatment result;
The method comprises the steps of obtaining fixed fields corresponding to all sample images and position layout information of the fixed fields corresponding to all sample images by identifying entity information of all extracted fixed field types;
And carrying out secondary clustering treatment on the primary clustering treatment result according to the position layout information of the fixed fields corresponding to all the sample images and the clustering assembly, and adding the sample images with the same position layout information of the fixed fields into the same clustering cluster to obtain the target clustering treatment result.
7. The identity document verification method according to claim 6, wherein the step of inputting the test image into the trained identity document type recognition model and recognizing the type of the identity document to be recognized according to the trained identity document type recognition model specifically comprises:
Performing content analysis and structure analysis on the test image by adopting the analysis component to obtain a text content analysis result and a structural feature analysis result;
according to the text content analysis result and the structural feature analysis result, adopting the first extraction component to extract text content and structural feature of the test image, and obtaining a text content extraction result and a structural feature extraction result;
Inputting the test image, the text content extraction result and the structural feature extraction result to the second extraction component, extracting the entity information of the target type, and extracting the position layout information corresponding to the entity information of the target type;
identifying a cluster to which the test image belongs according to the entity information of the target type and the position layout information corresponding to the entity information of the target type;
And identifying the type of the identity document to be identified based on the cluster to which the test image belongs.
8. An identity document verification apparatus, comprising:
the sample image acquisition module is used for acquiring sample images of different identity documents in batches from a preset sample library;
The identification model training module is used for inputting the sample image into an identification model of the identity document type to be trained, and carrying out model training to obtain a trained identification model of the identity document type;
the test image acquisition module is used for acquiring a test image of the identity document to be identified;
The identification model identification module is used for inputting the test image into the identity document type identification model after training, and identifying the type of the identity document to be identified according to the identity document type identification model after training;
The structural analysis module is used for calling a preset structural analysis template based on the type of the identity document to be identified, and carrying out structural analysis on the content in the identity document to obtain a structural analysis result;
The verification comparison module is used for comparing the structural analysis result with a target text to be verified, wherein the target text to be verified is customer identity information recorded in a target claim document;
The first condition judging module is used for recognizing that the structured analysis result completely contains the target text to be verified through comparison, and the target text to be verified is verified successfully;
And the second condition judging module is used for identifying that the target text to be verified fails to be verified if the structural analysis result does not completely contain the target text to be verified through comparison.
9. A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the identity document verification method of any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the steps of the identity document verification method of any one of claims 1 to 7.
CN202410323313.4A 2024-03-20 2024-03-20 Identity document verification method, device, equipment and storage medium thereof Pending CN118115293A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410323313.4A CN118115293A (en) 2024-03-20 2024-03-20 Identity document verification method, device, equipment and storage medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410323313.4A CN118115293A (en) 2024-03-20 2024-03-20 Identity document verification method, device, equipment and storage medium thereof

Publications (1)

Publication Number Publication Date
CN118115293A true CN118115293A (en) 2024-05-31

Family

ID=91221114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410323313.4A Pending CN118115293A (en) 2024-03-20 2024-03-20 Identity document verification method, device, equipment and storage medium thereof

Country Status (1)

Country Link
CN (1) CN118115293A (en)

Similar Documents

Publication Publication Date Title
CN112417096B (en) Question-answer pair matching method, device, electronic equipment and storage medium
CN109034069B (en) Method and apparatus for generating information
CN113722438B (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
CN109062972A (en) Web page classification method, device and computer readable storage medium
CN111898550B (en) Expression recognition model building method and device, computer equipment and storage medium
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
CN116704528A (en) Bill identification verification method, device, computer equipment and storage medium
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN113609833B (en) Dynamic file generation method and device, computer equipment and storage medium
CN115690819A (en) Big data-based identification method and system
CN118115293A (en) Identity document verification method, device, equipment and storage medium thereof
CN113688268B (en) Picture information extraction method, device, computer equipment and storage medium
CN113988223B (en) Certificate image recognition method, device, computer equipment and storage medium
CN114820211B (en) Method, device, computer equipment and storage medium for checking and verifying quality of claim data
CN116467166A (en) Defect information processing method, device, equipment and storage medium thereof
CN117234505A (en) Interactive page generation method, device, equipment and storage medium thereof
CN117422270A (en) Material auditing method, device, equipment and storage medium thereof
CN118094297A (en) Medical data identification method, device, equipment and medium based on artificial intelligence
CN117493563A (en) Session intention analysis method, device, equipment and storage medium thereof
CN116822454A (en) Formula configuration method, device, computer equipment and storage medium
CN117056488A (en) Data complement method, device, equipment and storage medium based on artificial intelligence
CN116665646A (en) Dialect data automatic screening and identifying method, device, equipment and storage medium thereof
CN117197814A (en) Data standardization method, device, equipment and storage medium thereof
CN117423125A (en) Image detection method, device, equipment and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination