CN112396060B

CN112396060B - Identification card recognition method based on identification card segmentation model and related equipment thereof

Info

Publication number: CN112396060B
Application number: CN202011286478.7A
Authority: CN
Inventors: 熊军
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2024-03-15
Anticipated expiration: 2040-11-17
Also published as: CN112396060A

Abstract

The embodiment of the application belongs to the technical field of artificial intelligence, is applied to the field of intelligent government affairs, and relates to an identity card identification method based on an identity card segmentation model and related equipment thereof, wherein the method comprises the steps of marking a received identity card picture set to generate an identity card image training set; inputting an identity card image in an identity card image training set into a preset identity card segmentation model to obtain a first prediction result; iteratively training an identity card segmentation model to obtain a trained identity card segmentation model; acquiring an identity card picture to be identified, and inputting the identity card picture to be identified into a trained identity card segmentation model to obtain a second prediction result; and screening a plurality of tag frames contained in the second prediction result to obtain a target tag frame, intercepting key fields in the identity card picture based on the target tag frame, and inputting the key fields into a pre-trained text recognition model to obtain a recognition result. The identification card image training set can be stored in a blockchain. The method and the device effectively improve the accuracy of identification card identification.

Description

Identification card recognition method based on identification card segmentation model and related equipment thereof

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an identification card recognition method based on an identification card segmentation model and related equipment thereof.

Background

The identity card is an effective certificate for proving legal identity of citizens, and is often required in aspects and aspects of life. Particularly in the case of riding, it is often necessary to use a computer to identify and record the identity card of the passenger. Therefore, it is important that the computer can quickly and accurately identify and read the identity card information.

The existing identification method is that the template matching method carries out character detection to obtain the key information of the identification card, and carries out character recognition to the detected key information part of the identification card. But the conventional template matching method is quite low tolerant to light, noise, and slight viewing angle changes. The identity card is divided into a Han-nationality identity card and a minority-nationality identity card, the minority-nationality identity card is provided with Mandarin characters and minority-nationality characters at the same time, more interference is generated in the identity card identification process, the robustness is poor by adopting a template matching method, and the accuracy of identity card identification is low.

Disclosure of Invention

The embodiment of the application aims to provide an identity card recognition method based on an identity card segmentation model and related equipment thereof, so that the accuracy of identity card recognition is improved.

In order to solve the above technical problems, the embodiment of the present application provides an identification card recognition method based on an identification card segmentation model, which adopts the following technical scheme:

an identification card recognition method based on an identification card segmentation model comprises the following steps:

receiving an identity card picture set, and marking the identity card picture set to obtain an identity card image training set;

inputting the identity card images in the identity card image training set into a preset identity card segmentation model based on a cavity space convolution pooling pyramid to obtain a first prediction result;

calculating a loss function based on the identity card image training set and the first prediction result, and iteratively training the identity card segmentation model until a preset convergence condition is reached, so as to obtain a trained identity card segmentation model;

acquiring an identity card picture to be identified, inputting the identity card picture to be identified into the trained identity card segmentation model, obtaining a second prediction result, and obtaining a plurality of label frames based on the second prediction result;

screening the plurality of tag frames to obtain a target tag frame, and intercepting key fields in the identity card picture based on the target tag frame;

Inputting the intercepted key fields into a pre-trained text recognition model to obtain a recognition result.

Further, the step of inputting the identification card picture into a preset identification card segmentation model based on a cavity space convolution pooling pyramid to obtain a first prediction result includes:

the identity card segmentation model is used for carrying out repeated downsampling on the identity card picture to obtain a downsampling result, and the downsampling result is input into a cavity space convolution pooling pyramid of the identity card segmentation model to obtain an output result;

upsampling the output result to obtain an upsampled result;

acquiring an identity card picture after the first downsampling, and performing low-feature acquisition and convolution on the identity card picture after the first downsampling to obtain a convolution result;

and splicing the up-sampling result and the convolution result to obtain a splicing result, and sequentially carrying out convolution and up-sampling on the splicing result to obtain the first prediction result.

Further, the step of inputting the downsampling result into the cavity space convolution pooling pyramid of the identification card segmentation model to obtain an output result includes:

respectively inputting the downsampling results into a preset convolution layer of a convolution kernel of 1 multiplied by 1, a convolution layer of a convolution kernel of 3 multiplied by 5 based on the first sampling frequency hole convolution, a convolution layer of a convolution kernel of 3 multiplied by 5 based on the second sampling frequency hole convolution, a convolution layer of a convolution kernel of 3 multiplied by 5 based on the third sampling frequency hole convolution and a pooling layer to respectively obtain intermediate results;

And splicing all intermediate results, and sequentially inputting two convolution layers with the convolution kernel sizes of 3 multiplied by 5 and a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain the output result.

Further, the step of performing low-feature collection and convolution on the identity card picture after the first downsampling to obtain a convolution result includes:

the identity card picture after the first downsampling is subjected to low-feature acquisition and is input into a convolution layer with the convolution kernel size of 1 multiplied by 1, so that a convolution result is obtained;

the step of sequentially performing convolution and upsampling on the spliced result to obtain the first prediction result comprises the following steps:

and convolving the splicing result with a convolution kernel of 3 multiplied by 5, and performing double up-sampling to obtain the first prediction result.

Further, the second prediction result includes a category value, and the step of screening the plurality of tag frames to obtain a target tag frame and intercepting key fields in the identification card picture based on the target tag frame includes:

obtaining class values corresponding to all tag frames in the second prediction result;

classifying the tag frames based on the class values to obtain a plurality of classes, wherein the class values in each class are the same;

In the same category, taking the tag frame meeting the preset condition as the target tag frame of the current category until the target tag frame of each category is respectively determined;

obtaining the position of a key field corresponding to each category based on all the target tag frames;

and based on the position of the key field, intercepting the key field in the identity card picture.

Further, the step of inputting the intercepted key field into a preset trained text recognition model to obtain a recognition result includes:

determining a target label frame corresponding to the intercepted key field and a class value carried by the target label frame;

determining whether there is a duplicate class value;

when no repeated class value exists, inputting the intercepted key field into a preset trained text recognition model to obtain a recognition result;

when repeated category values exist, inputting the intercepted key fields into a preset trained text recognition model to obtain a plurality of to-be-judged results;

determining a preset character set corresponding to the result to be judged based on the class value;

identifying whether the corresponding character in the character set exists in the result to be judged;

And taking the result to be judged of the characters in the character set as a recognition result.

In order to solve the technical problem, the embodiment of the application also provides an identification card recognition device based on an identification card segmentation model, which adopts the following technical scheme:

an identification card recognition device based on an identification card segmentation model, comprising:

the image labeling module is used for receiving the identity card image set and labeling the identity card image set to generate an identity card image training set;

the first input module is used for inputting the identity card images in the identity card image training set into a preset identity card segmentation model based on a cavity space convolution pooling pyramid to obtain a first prediction result;

the loss calculation module is used for calculating a loss function based on the identity card image training set and the first prediction result, and iteratively training the identity card segmentation model until a preset convergence condition is reached, so as to obtain a trained identity card segmentation model;

the second input module is used for acquiring an identity card picture to be identified, inputting the identity card picture to be identified into the trained identity card segmentation model, obtaining a second prediction result, and obtaining a plurality of label frames based on the second prediction result;

The target screening module is used for screening the plurality of tag frames to obtain a target tag frame, and intercepting key fields in the identity card picture based on the target tag frame; and

and the text recognition module is used for inputting the intercepted key fields into a pre-trained text recognition model to obtain a recognition result.

In order to solve the above technical problems, the embodiments of the present application further provide a computer device, which adopts the following technical schemes:

the computer equipment comprises a memory and a processor, wherein the memory stores computer readable instructions, and the processor realizes the steps of the identification card identification method based on the identification card segmentation model when executing the computer readable instructions.

In order to solve the above technical problems, embodiments of the present application further provide a computer readable storage medium, which adopts the following technical solutions:

a computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the steps of the identification card recognition method based on an identification card segmentation model described above.

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

Compared with the traditional template matching, the identification card segmentation model has stronger robustness, and particularly has an obvious better effect on the interfered minority identification card than the template matching. Because the identification is carried out through the identification card segmentation model, rather than the matching is carried out through the template, the tolerance to light, noise and slight visual angle change is higher, and the position of the key field in the identification card picture is positioned through the identification card segmentation model and the target tag frame. The text detection of the text recognition model can be better facilitated, and the accuracy of identification card recognition is improved.

Drawings

For a clearer description of the solution in the present application, a brief description will be given below of the drawings that are needed in the description of the embodiments of the present application, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method of identification card identification based on an identification card segmentation model according to the present application;

FIG. 3 is a schematic diagram of one embodiment of an identification device based on an identification segmentation model according to the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device according to the present application.

Reference numerals: 200. a computer device; 201. a memory; 202. a processor; 203. a network interface; 300. an identification card recognition device based on an identification card segmentation model; 301. a picture marking module; 302. a first input module; 303. a loss calculation module; 304. a second input module; 305. a target screening module; 306. and a text recognition module.

Description of the embodiments

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the identification method based on the identification card segmentation model provided in the embodiments of the present application is generally executed by a server/terminal device, and correspondingly, the identification card recognition device based on the identification card segmentation model is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of a method of identification card identification based on an identification card segmentation model according to the present application is shown. The identification card recognition method based on the identification card segmentation model comprises the following steps:

S1: and receiving the identity card picture set, and marking the identity card picture set to obtain an identity card image training set.

In this embodiment, the identification card picture set is used as a training sample in the supervised model training, and the labeling result is used as a real situation for calculating the loss function.

In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the identification card recognition method based on the identification card segmentation model operates may receive the identification card picture set through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G connections, wiFi connections, bluetooth connections, wiMAX connections, zigbee connections, UWB (ultra wideband) connections, and other now known or later developed wireless connection means.

Specifically, the step of labeling the identification card picture set to obtain an identification card image training set includes:

displaying the identity card picture set in a preset front-end page, and informing a user to mark the identity card picture set;

after identifying that a user finishes marking each identity card picture in the identity card picture set through a preset marking tool in the front-end page, identifying content marked by the user, wherein the content marked by the user comprises a rectangular frame and a class value of an image in the rectangular frame defined by the user;

And generating a mask on each identity card picture based on the content marked by the user, and finishing marking the identity card picture set to obtain the identity card image training set.

In this embodiment, a labeling tool is used to generate a mask, and the labeling tool may be LabelImg (object detection labeling tool). The categories include category 1, category 2, category 3, category 4, and category 5. The name field is marked as class 1, the gender is class 2, the ethnicity is class 3, the address is class 4, and the identification card number is class 5. The year, month and day of life is not marked because it can be derived from the identification number. The identity card image training set is generated by labeling the identity card image set, so that the subsequent training of the identity card segmentation model is facilitated.

S2: and inputting the identity card images in the identity card image training set into a preset identity card segmentation model based on the cavity space convolution pooling pyramid to obtain a first prediction result.

In this embodiment, semantic segmentation is a typical computer vision problem, specifically taking some raw data, such as planar images, as input and converting them into a mask with highlighted regions of interest. Currently, a commonly used recognition algorithm is the term full-pel semantic segmentation (full-pixel semantic segmentation), where each pixel in an image is assigned a class ID according to the object of interest to which it belongs, which is the process of classifying each pixel as belonging to an object class. It plays a major role in autopilot systems (specifically street view identification and understanding), unmanned aerial vehicle applications (landing site judgment), and wearable device applications. The method and the device construct an identity card segmentation model based on the cavity space convolution pooling pyramid, and combine an identity card image training set to realize segmentation of each key information field from the identity card.

Specifically, the step of inputting the identification card picture into a preset identification card segmentation model based on a cavity space convolution pooling pyramid to obtain a first prediction result includes:

upsampling the output result to obtain an upsampled result;

In this embodiment, the low-feature collection and convolution are performed on the identification card picture after the first downsampling, and the low-feature collection and convolution are performed with the output result of the cavity space convolution pooling pyramid (ASPP). The hole space convolution pooling pyramid (atrous spatial pyramid pooling, ASPP) performs hole convolution at different sampling frequencies for a given input, equivalent to capturing the context of an image at multiple scales. At present, the characteristics of the trunk network for multiple downsampling are spliced with the characteristics of ASPP output. But the ratio of the name, sex and ethnicity three key fields in the identity card is less than 1% in the whole identity picture, and the identity card belongs to a small target object, and the position information is lost more for the small target object after multiple downsampling, so that the position information is reserved for improving the segmentation effect of the small target object. Low features (low features) refer to some small detailed information in the image, such as edges (edges), corners (colors), pixels (pixels), gradients (gradients), etc., which can be collected by filters.

In addition, the steps of performing multiple downsampling on the identity card picture to obtain a downsampling result, inputting the downsampling result into a cavity space convolution pooling pyramid of the identity card segmentation model to obtain an output result, and performing upsampling on the output result to obtain specific numerical values in the steps of upsampling result are as follows: 4 times of 2 times downsampling is carried out on the identity card picture, namely 16 times downsampling is carried out, and the downsampling result is input into a cavity space convolution pooling pyramid of the identity card segmentation model to obtain an output result; and 8 times of up-sampling is carried out on the output result to obtain an up-sampling result.

The step of inputting the downsampling result into the cavity space convolution pooling pyramid of the identity card segmentation model to obtain an output result comprises the following steps:

In this embodiment, two layers of 3×5 convolutions are added before aspp1×1 convolutions (Conv) to get a richer semantic feature. The frequency of the samples is set according to the frequency parameter (rate), when the rate is 1, the original image does not lose any information samples, the convolution operation is a standard convolution operation, and when the rate is >1, such as 2, the original image is sampled every other pixel (rate minus 1). The first sampling frequency, the second sampling frequency and the third sampling frequency are all different in this application. Wherein the first sampling frequency is 6, the second sampling frequency is 12, and the third sampling frequency is 16. In the practical application process, the first sampling frequency, the second sampling frequency and the third sampling frequency can be adjusted according to practical requirements. By setting different sampling frequencies, richer image features are obtained.

The step of acquiring and convoluting the identity card picture after the first downsampling to obtain a convolution result comprises the following steps:

In this embodiment, since the present application is directed to the whole field segmentation of the id card, the segmentation targets of the id card are all elongated, so that a convolution kernel with a size of 3×5 is set in the id card segmentation model, so that a larger receptive field is provided in the direction along the id card number (i.e., in the transverse direction).

S3: and calculating a loss function based on the identity card image training set and the first prediction result, and iteratively training the identity card segmentation model until a preset convergence condition is reached, so as to obtain a trained identity card segmentation model.

In this embodiment, the preset convergence condition in this application may be that the preset iteration number T is reached, or other stop conditions may be set according to actual needs, and the method is applicable. A specific loss function may employ Dice. The Dice is provided for the problem that the foreground proportion is too small, and is suitable for the identity card segmentation scene of the application. Where Dice refers to a Dice coefficient that originates from two classifications, essentially measuring the overlap of two samples. The formula is as follows: dice loss=1- ,

Wherein A is a first prediction result of the model, and B is the marked content of the user in the identity card image training set.

S4: and acquiring an identity card picture to be identified, inputting the identity card picture to be identified into the trained identity card segmentation model, obtaining a second prediction result, and obtaining a plurality of label frames based on the second prediction result.

In this embodiment, the identification card picture to be identified is input into the trained identification card segmentation model, and a second prediction result is obtained. The second prediction result is output as a mask segmentation map (mask map). The tag Box (Bounding Box) can be obtained by a method of image processing (Bounding-Box regression).

S5: and screening the plurality of tag frames to obtain a target tag frame, and intercepting key fields in the identity card picture based on the target tag frame.

In this embodiment, the multiple tag frames are filtered to obtain a target tag frame, that is, a specific position of each key field in the picture is indicated. And then cutting out corresponding field pictures from the original pictures according to the positions and sending the field pictures into a character recognition model.

Specifically, the second prediction result includes a class value, and the step of screening the plurality of tag frames to obtain a target tag frame and intercepting key fields in the identification card picture based on the target tag frame includes:

In this embodiment, in the content output by the model, since there are not only mandarin but also minority languages in the minority identity card, the model does not output only one tag frame for the same category, and the tag frame output by the model needs to be screened, and the target tag frame meeting the preset condition is selected, so that the position of the key field to be identified is located through the target tag frame.

The step of taking the tag frame meeting the preset condition as the target tag frame of the current category comprises the following steps:

identifying an area of the tag frame in each category;

And taking the label frame with the largest area as the target label frame of the current category.

In this embodiment, generally, in one category, only one label frame with the largest area is provided, so that the label frame with the largest area can be directly used as the target label frame of the current category, which is convenient for the computer to quickly select the target label frame.

Correspondingly, the step of taking the label frame meeting the preset condition as the target label frame of the current category comprises the following steps:

identifying an area of the tag frame in each category;

and taking the label frame with the area exceeding the preset threshold value as the target label frame of the current category.

In this embodiment, a threshold may be set, and all tag frames with areas exceeding a preset threshold are used as target tag frames of the current category. And the key fields corresponding to the tag boxes larger than the threshold value are intercepted and sent into the text recognition model for recognition, and the computer determines the final output content after judging the content output by the text recognition model.

S6: inputting the intercepted key fields into a pre-trained text recognition model to obtain a recognition result.

In this embodiment, the text recognition model may be ctpn+crnn. Ctpn+crnn is a deep learning OCR character recognition method, in which CTPN is used to detect characters, CRNN is used to recognize characters, and the method has a good performance in character recognition. Among them, CTPN (Detecting Text in Natural Image with Connectionist Text Proposal Network, text detection based on connection to a pre-selected box network) is mainly to accurately locate text lines in a picture. CRNN (Convolutional Recurrent Neural Network, convolutional neural network) has the advantages of high character recognition speed, good performance and small model.

Specifically, the step of inputting the intercepted key field into a preset trained text recognition model to obtain a recognition result includes:

determining whether there is a duplicate class value;

In this embodiment, the result to be determined that no character in the character set exists is used as an error result. And outputting all the identification results until the processing of all the results to be judged is completed. Illustrating: and a plurality of tag frames exist in the same category (such as gender category) in the prediction result output by the identification card segmentation model, and the tag frames need to be further screened, so that the target tag frames in the category are determined. In a common han group identity card, a label frame exceeding a preset threshold value under the category is generally one, and can be directly used as a target label frame. In the minority national identity card, the minority national identity card comprises characters of mandarin and minority nationality. For example, in the category of gender, the minority identity card has a "gender" two-word printed on the mandarin chinese, and the minority language printed with "gender" corresponds to the mandarin chinese. The number of the tag boxes exceeding the preset threshold is multiple (due to the interference of the minority nationality characters, the category lower mandarin is corresponding to one tag box and the minority nationality characters under the category are also corresponding to one tag box), the tag boxes exceeding the preset threshold under the category are directly used as target tag boxes, the corresponding key fields are input into a text recognition model, and finally the characters output by the text recognition model are judged, so that the recognition result is obtained. In the input text recognition model, the text output by the text recognition model is finally judged, so that the specific examples of recognition results are as follows: under the same category of the minority identity card, after 2 target tag frames are obtained, the corresponding key fields are intercepted respectively, and are input into a text recognition model, and the corresponding fields output respectively at the moment are male and black and serve as a result to be judged. It is conceivable that: a black field appears, which is output because the text recognition model is inaccurate for recognition of minority words. Since sex can only be the character male or female, and the sex character set exists, when male and black is identified, the character set can judge that only the character to be judged with the field being black exists in the character set, and the character to be judged with the field being male is taken as the identification result. And taking the to-be-judged result with the black field as an error result.

It should be emphasized that, to further ensure the privacy and security of the identification card image training set, the identification card image training set may also be stored in a node of a blockchain.

The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The intelligent city intelligent identification card can be applied to the intelligent government affair field and used for identifying resident identification cards in intelligent government affairs, so that construction of intelligent cities is promoted.

Those skilled in the art will appreciate that implementing all or part of the processes of the methods of the embodiments described above may be accomplished by way of computer readable instructions, stored on a computer readable storage medium, which when executed may comprise processes of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an identification card recognition device based on an identification card segmentation model, where an embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be specifically applied to various electronic devices.

As shown in fig. 3, the identification card recognition device 300 based on the identification card segmentation model according to the present embodiment includes: a picture annotation module 301, a first input module 302, a loss calculation module 303, a second input module 304, a target screening module 305, and a text recognition module 306. Wherein: the picture labeling module 301 is configured to receive an identification card picture set, label the identification card picture set, and generate an identification card image training set; the first input module 302 is configured to input an identification card image in the identification card image training set into a preset identification card segmentation model based on a cavity space convolution pooling pyramid, so as to obtain a first prediction result; the loss calculation module 303 is configured to calculate a loss function based on the training set of the identification card image and the first prediction result, and iteratively train the identification card segmentation model until a preset convergence condition is reached, thereby obtaining a trained identification card segmentation model; the second input module 304 is configured to obtain an identification card picture to be identified, input the identification card picture to be identified into the trained identification card segmentation model, obtain a second prediction result, and obtain a plurality of tag frames based on the second prediction result; the target screening module 305 is configured to screen the multiple tag frames to obtain a target tag frame, and intercept key fields in the id card picture based on the target tag frame; and a text recognition module 306, configured to input the intercepted key fields into a pre-trained text recognition model, so as to obtain a recognition result.

In the embodiment, compared with the traditional template matching, the identification card segmentation model has stronger robustness, and particularly has a significantly better effect on the interfered minority identification card than the template matching. Because the identification is carried out through the identification card segmentation model, rather than the matching is carried out through the template, the tolerance to light, noise and slight visual angle change is higher, and the position of the key field in the identification card picture is positioned through the identification card segmentation model and the target tag frame. The text detection of the text recognition model can be better facilitated, and the accuracy of identification card recognition is improved.

The picture annotation module 301 includes a notification sub-module, an identification sub-module, and a generation sub-module. The notification sub-module is used for displaying the identity card picture set in a preset front-end page and notifying a user to mark the identity card picture set; the identification submodule is used for identifying content marked by a user after the user is identified to finish marking each identity card picture in the identity card picture set through a preset marking tool in the front-end page, wherein the content marked by the user comprises a rectangular frame and a class value of an image in the rectangular frame defined by the user; the generating submodule is used for generating a mask on each identity card picture based on the content marked by the user to finish marking the identity card picture set and obtain the identity card image training set.

The first input module 302 includes a downsampling sub-module, an upsampling sub-module, a sampling sub-module, and a stitching sub-module. The downsampling sub-module is used for downsampling the identity card picture for a plurality of times through the identity card segmentation model to obtain downsampling results, and inputting the downsampling results into a cavity space convolution pooling pyramid of the identity card segmentation model to obtain output results; the up-sampling submodule is used for up-sampling the output result to obtain an up-sampling result; the acquisition sub-module is used for acquiring the identity card picture after the first downsampling, and carrying out low-feature acquisition and convolution on the identity card picture after the first downsampling to obtain a convolution result; and the splicing sub-module is used for splicing the up-sampling result and the convolution result to obtain a splicing result, and sequentially carrying out convolution and up-sampling on the splicing result to obtain the first prediction result.

The downsampling submodule comprises an input unit and a splicing unit. The input unit is used for respectively inputting the downsampling results into a preset convolution layer of a convolution kernel of 1 multiplied by 1, a convolution layer of a convolution kernel of 3 multiplied by 5 based on the first sampling frequency hole convolution, a convolution layer of a convolution kernel of 3 multiplied by 5 based on the second sampling frequency hole convolution, a convolution layer of a convolution kernel of 3 multiplied by 5 based on the third sampling frequency hole convolution and a pooling layer to respectively obtain intermediate results; the splicing unit is used for splicing all intermediate results, and sequentially inputting two convolution layers with the convolution kernel sizes of 3 multiplied by 5 and a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain the output result.

In some optional implementations of this embodiment, the collecting submodule is further configured to: and carrying out low-feature acquisition on the identity card picture after the first downsampling, and inputting the low-feature acquisition into a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain a convolution result.

In some optional implementations of this embodiment, the splicing submodule is further configured to: and convolving the splicing result with a convolution kernel of 3 multiplied by 5, and performing double up-sampling to obtain the first prediction result.

The target screening module 305 includes an acquisition sub-module, a scoring sub-module, a determination sub-module, a location sub-module, and a interception sub-module. The obtaining sub-module is used for obtaining class values corresponding to all tag frames in the second prediction result; the classifying sub-module is used for classifying the label frame based on the class values to obtain a plurality of classes, wherein the class values in each class are the same; the determining submodule is used for taking the tag frame meeting the preset condition as the target tag frame of the current category in the same category until the target tag frame of each category is respectively determined; the position submodule is used for obtaining the position of the key field corresponding to each category based on all the target tag frames; and the intercepting sub-module is used for intercepting the key fields in the identity card picture based on the positions of the key fields.

In some optional implementations of this embodiment, the determining submodule is further configured to: and identifying the area of the label frame in each category, and taking the label frame with the largest area as the target label frame of the current category.

In some optional implementations of this embodiment, the determining submodule is further configured to: and identifying the area of the label frame in each category, and taking the label frame with the area exceeding a preset threshold value as the target label frame of the current category.

The text recognition module 306 includes a category sub-module, a first determination sub-module, a first recognition sub-module, a second recognition sub-module, a character set sub-module, a second determination sub-module, and a result sub-module. The category submodule is used for determining a target label frame corresponding to the intercepted key field and a category value carried by the target label frame; the first judging submodule is used for determining whether repeated category values exist or not; the first recognition submodule is used for inputting the intercepted key fields into a preset trained text recognition model when the repeated category values do not exist, and obtaining a recognition result; the second recognition submodule is used for inputting the intercepted key fields into a preset trained text recognition model when repeated category values exist, and obtaining a plurality of to-be-judged results; the character set sub-module is used for determining a preset character set corresponding to the result to be judged based on the class value; the second judging submodule is used for identifying whether the corresponding character in the character set exists in the to-be-judged result; the result submodule is used for taking the result to be judged of the characters in the character set as a recognition result.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 200 includes a memory 201, a processor 202, and a network interface 203 communicatively coupled to each other via a system bus. It should be noted that only computer device 200 having components 201-203 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 201 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 200. Of course, the memory 201 may also include both internal storage units of the computer device 200 and external storage devices. In this embodiment, the memory 201 is generally used to store an operating system and various application software installed on the computer device 200, such as computer readable instructions of an identification card recognition method based on an identification card segmentation model. In addition, the memory 201 may be used to temporarily store various types of data that have been output or are to be output.

The processor 202 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 202 is generally used to control the overall operation of the computer device 200. In this embodiment, the processor 202 is configured to execute computer readable instructions stored in the memory 201 or process data, for example, execute computer readable instructions of the identification card recognition method based on the identification card segmentation model.

The network interface 203 may comprise a wireless network interface or a wired network interface, which network interface 203 is typically used to establish communication connections between the computer device 200 and other electronic devices.

In the embodiment, the identification card segmentation model is pertinently arranged according to the attribute characteristics of the identification card, so that the segmentation precision of the model is improved. And the text recognition model is combined, so that the accuracy of identification card recognition is effectively improved.

The present application also provides another embodiment, namely, a computer-readable storage medium, where computer-readable instructions are stored, where the computer-readable instructions are executable by at least one processor, so that the at least one processor performs the steps of the identification card recognition method based on the identification card segmentation model as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims

1. An identification card recognition method based on an identification card segmentation model is characterized by comprising the following steps:

inputting the intercepted key fields into a pre-trained text recognition model to obtain a recognition result;

the step of inputting the identity card images in the identity card image training set into a preset identity card segmentation model based on a cavity space convolution pooling pyramid to obtain a first prediction result comprises the following steps of:

The identity card image is subjected to repeated downsampling through the identity card segmentation model to obtain downsampling results, and the downsampling results are respectively input into a preset convolution layer with a convolution kernel of 1 multiplied by 1, a convolution layer with a convolution kernel of 3 multiplied by 5 based on the first sampling frequency hole convolution, a convolution layer with a convolution kernel of 3 multiplied by 5 based on the second sampling frequency hole convolution, a convolution layer with a convolution kernel of 3 multiplied by 5 based on the third sampling frequency hole convolution and a pooling layer to obtain intermediate results;

splicing all intermediate results, sequentially inputting two convolution layers with the convolution kernel sizes of 3 multiplied by 5 and a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain an output result;

upsampling the output result to obtain an upsampled result;

acquiring an identity card picture after the first downsampling, performing low-feature acquisition on the identity card picture after the first downsampling, and inputting the low-feature acquisition into a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain a convolution result;

splicing the up-sampling result and the convolution result to obtain a splicing result, convoluting the splicing result with a convolution kernel of 3 multiplied by 5, and performing double up-sampling to obtain the first prediction result;

the second prediction result includes a category value, and the step of screening the plurality of tag frames to obtain a target tag frame and intercepting key fields in the identity card picture based on the target tag frame includes:

2. The identification card recognition method based on the identification card segmentation model according to claim 1, wherein the step of inputting the intercepted key field into a preset trained text recognition model to obtain a recognition result comprises the following steps:

determining whether there is a duplicate class value;

3. The identification card recognition method based on the identification card segmentation model according to claim 1, wherein the step of labeling the identification card picture set to obtain an identification card image training set comprises:

4. An identification card recognition device based on an identification card segmentation model is characterized by comprising:

the text recognition module is used for inputting the intercepted key fields into a pre-trained text recognition model to obtain a recognition result;

The first input module is further used for carrying out repeated downsampling on the identity card picture through the identity card segmentation model to obtain downsampling results, and inputting the downsampling results into a convolution layer of a preset convolution kernel of 1×1, a convolution layer of a convolution kernel of 3×5 based on first sampling frequency hole convolution, a convolution layer of a convolution kernel of 3×5 based on second sampling frequency hole convolution, a convolution layer of a convolution kernel of 3×5 based on third sampling frequency hole convolution and a pooling layer respectively to obtain intermediate results; splicing all intermediate results, sequentially inputting two convolution layers with the convolution kernel sizes of 3 multiplied by 5 and a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain an output result; upsampling the output result to obtain an upsampled result; acquiring an identity card picture after the first downsampling, performing low-feature acquisition on the identity card picture after the first downsampling, and inputting the low-feature acquisition into a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain a convolution result; splicing the up-sampling result and the convolution result to obtain a splicing result, convoluting the splicing result with a convolution kernel of 3 multiplied by 5, and performing double up-sampling to obtain the first prediction result;

the second prediction result comprises class values, and the target screening module is further used for obtaining class values corresponding to all label frames in the second prediction result; classifying the tag frames based on the class values to obtain a plurality of classes, wherein the class values in each class are the same; in the same category, taking the tag frame meeting the preset condition as the target tag frame of the current category until the target tag frame of each category is respectively determined; obtaining the position of a key field corresponding to each category based on all the target tag frames; and based on the position of the key field, intercepting the key field in the identity card picture.

5. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the identification card recognition method based on an identification card segmentation model as claimed in any one of claims 1 to 3.

6. A computer readable storage medium, wherein computer readable instructions are stored on the computer readable storage medium, which when executed by a processor, implement the steps of the identification card recognition method based on the identification card segmentation model according to any one of claims 1 to 3.