CN112396060A

CN112396060A - Identity card identification method based on identity card segmentation model and related equipment thereof

Info

Publication number: CN112396060A
Application number: CN202011286478.7A
Authority: CN
Inventors: 熊军
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2021-02-23
Anticipated expiration: 2040-11-17
Also published as: CN112396060B

Abstract

The embodiment of the application belongs to the technical field of artificial intelligence, is applied to the field of intelligent government affairs, and relates to an identification card identification method based on an identification card segmentation model and related equipment thereof, wherein the identification card identification method comprises the steps of marking a received identification card picture set to generate an identification card image training set; inputting the identity card images in the identity card image training set into a preset identity card segmentation model to obtain a first prediction result; iteratively training an identity card segmentation model to obtain a trained identity card segmentation model; acquiring an identity card picture to be recognized, inputting the identity card picture to be recognized into the trained identity card segmentation model, and acquiring a second prediction result; and screening a plurality of label frames contained in the second prediction result to obtain a target label frame, intercepting key fields in the identity card picture based on the target label frame, and inputting the key fields into a pre-trained text recognition model to obtain a recognition result. The identity card image training set can be stored in the block chain. The method and the device effectively improve the accuracy of identification card identification.

Description

Identity card identification method based on identity card segmentation model and related equipment thereof

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an identity card identification method based on an identity card segmentation model and related equipment thereof.

Background

The identity card is an effective certificate for proving the legal identity of the citizen, and the identity card is frequently used in the aspect of life. Particularly in the case of a ride, it is often necessary to use a computer to identify and record the identity card of the passenger. Therefore, it is important that the computer can quickly and accurately identify and read the identification card information.

The existing identification card identification method is that a template matching method is used for carrying out character detection to obtain key information of an identification card, and character identification is carried out on the detected key information part of the identification card. But conventional template matching methods are rather low tolerant to light, noise and slight viewing angle changes. The identity card is divided into a Chinese identity card and a minority identity card, the minority identity card is simultaneously provided with mandarin characters and minority characters, more interference is generated in the identity card identification process, the robustness is poor by adopting a template matching method, and the accuracy of identity card identification is low.

Disclosure of Invention

The embodiment of the application aims to provide an identity card identification method based on an identity card segmentation model and related equipment thereof, and the accuracy of identity card identification is improved.

In order to solve the above technical problem, an embodiment of the present application provides an identity card identification method based on an identity card segmentation model, which adopts the following technical scheme:

an identity card identification method based on an identity card segmentation model comprises the following steps:

receiving an identity card picture set, and labeling the identity card picture set to obtain an identity card image training set;

inputting the identity card images in the identity card image training set into a preset identity card segmentation model based on a cavity space convolution pooling pyramid to obtain a first prediction result;

calculating a loss function based on the identity card image training set and a first prediction result, and iteratively training the identity card segmentation model until a preset convergence condition is reached to obtain a trained identity card segmentation model;

acquiring an identity card picture to be recognized, inputting the identity card picture to be recognized into the trained identity card segmentation model to obtain a second prediction result, and obtaining a plurality of label frames based on the second prediction result;

screening the plurality of label frames to obtain a target label frame, and intercepting key fields in the identity card picture based on the target label frame;

and inputting the intercepted key fields into a pre-trained text recognition model to obtain a recognition result.

Further, the step of inputting the identification card picture into a preset identification card segmentation model based on the void space convolution pooling pyramid, and obtaining a first prediction result includes:

the identity card picture is downsampled for multiple times through the identity card segmentation model to obtain downsampling results, and the downsampling results are input into a cavity space convolution pooling pyramid of the identity card segmentation model to obtain output results;

upsampling the output result to obtain an upsampled result;

acquiring an identity card picture after first down-sampling, and performing low-feature acquisition and convolution on the identity card picture after the first down-sampling to obtain a convolution result;

and splicing the up-sampling result and the convolution result to obtain a splicing result, and sequentially performing convolution and up-sampling on the splicing result to obtain the first prediction result.

Further, the step of inputting the downsampling result into the void space convolution pooling pyramid of the identity card segmentation model to obtain an output result includes:

respectively inputting the downsampling result into a preset convolution layer of a convolution kernel of 1 × 1, a convolution layer of a convolution kernel of 3 × 5 based on the first sampling frequency cavity convolution, a convolution layer of a convolution kernel of 3 × 5 based on the second sampling frequency cavity convolution, a convolution layer of a convolution kernel of 3 × 5 based on the third sampling frequency cavity convolution and a pooling layer to respectively obtain intermediate results;

and splicing all the intermediate results, and sequentially inputting two convolution layers with convolution kernels of which the sizes are 3 multiplied by 5 and convolution layers with convolution kernels of which the sizes are 1 multiplied by 1 to obtain the output result.

Further, the step of performing low-feature acquisition and convolution on the identity card picture after the first downsampling to obtain a convolution result includes:

performing low-feature acquisition on the identity card picture subjected to the first downsampling, and inputting the identity card picture into a convolution layer with a convolution kernel size of 1 multiplied by 1 to obtain a convolution result;

the step of sequentially performing convolution and upsampling on the splicing result to obtain the first prediction result comprises:

and convolving the splicing result by a convolution kernel of 3 multiplied by 5, and performing double upsampling to obtain the first prediction result.

Further, the second prediction result includes a category value, the step of screening the plurality of tag frames to obtain a target tag frame, and intercepting a key field in the identity card picture based on the target tag frame includes:

obtaining category values corresponding to all label frames in the second prediction result;

classifying the label frame based on the classification value to obtain a plurality of classifications, wherein the classification value in each classification is the same;

in the same category, taking the label frames meeting the preset conditions as target label frames of the current category until the target label frames of each category are respectively determined;

obtaining the position of a key field corresponding to each category based on all target label frames;

and intercepting the key field in the identity card picture based on the position of the key field.

Further, the step of inputting the intercepted key field into a preset trained text recognition model to obtain a recognition result includes:

determining a target label frame corresponding to the intercepted key field and a category value carried by the target label frame;

determining whether there is a duplicate category value;

when the repeated category value does not exist, inputting the intercepted key field into a preset trained text recognition model to obtain a recognition result;

when repeated category values exist, inputting the intercepted key fields into a preset trained text recognition model to obtain a plurality of results to be judged;

determining a preset character set corresponding to the result to be judged based on the category value;

identifying whether the corresponding characters in the character set exist in the result to be judged;

and taking the result to be judged of the character in the character set as a recognition result.

In order to solve the above technical problem, an embodiment of the present application further provides an identification card recognition apparatus based on an identification card segmentation model, which adopts the following technical scheme:

an identification card recognition device based on an identification card segmentation model, comprising:

the image marking module is used for receiving the identity card image set, marking the identity card image set and generating an identity card image training set;

the first input module is used for inputting the identity card images in the identity card image training set into a preset identity card segmentation model based on a cavity space convolution pooling pyramid to obtain a first prediction result;

the loss calculation module is used for calculating a loss function based on the identity card image training set and the first prediction result, and iteratively training the identity card segmentation model until a preset convergence condition is reached to obtain a trained identity card segmentation model;

the second input module is used for acquiring an identity card picture to be recognized, inputting the identity card picture to be recognized into the trained identity card segmentation model to obtain a second prediction result, and obtaining a plurality of label frames based on the second prediction result;

the target screening module is used for screening the plurality of label frames to obtain a target label frame and intercepting key fields in the identity card picture based on the target label frame; and

and the text recognition module is used for inputting the intercepted key fields into a pre-trained text recognition model to obtain a recognition result.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device includes a memory and a processor, the memory stores computer readable instructions, and the processor implements the steps of the identification card identification method based on the identification card segmentation model when executing the computer readable instructions.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the steps of the identification card identification method based on the identification card segmentation model.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

compared with the traditional template matching, the identity card segmentation model has stronger robustness, and particularly has an effect obviously superior to that of the template matching on the interfering minority nationality identity card. Because the identification is carried out through the identity card segmentation model instead of the matching through the template, the tolerance on light, noise and slight visual angle change is higher, and the positions of the key fields in the identity card picture are positioned through the identity card segmentation model and the target label frame. The character detection of the text recognition model can be better facilitated, and the accuracy of identification card recognition is improved.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of an identification card recognition method based on an identification card segmentation model according to the application;

FIG. 3 is a schematic diagram of an embodiment of an identification card recognition apparatus based on an identification card segmentation model according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.

Reference numerals: 200. a computer device; 201. a memory; 202. a processor; 203. a network interface; 300. an identification card recognition device based on an identification card segmentation model; 301. a picture marking module; 302. a first input module; 303. a loss calculation module; 304. a second input module; 305. a target screening module; 306. and a text recognition module.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that the identification card recognition method based on the identification card segmentation model provided in the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the identification card recognition apparatus based on the identification card segmentation model is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow diagram of one embodiment of an identification card recognition method based on an identification card segmentation model in accordance with the present application is shown. The identification card recognition method based on the identification card segmentation model comprises the following steps:

s1: and receiving an identity card picture set, and labeling the identity card picture set to obtain an identity card image training set.

In this embodiment, the identity card image set is used as a training sample in supervised model training, and the labeling result is used as a real situation for calculating a loss function.

In this embodiment, an electronic device (for example, the server/terminal device shown in fig. 1) on which the identification card recognition method based on the identification card segmentation model operates may receive the identification card picture set through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

Specifically, the step of marking the identity card image set to obtain the identity card image training set includes:

displaying the identity card picture set in a preset front-end page, and informing a user to mark the identity card picture set;

when the fact that a user marks all the identity card pictures in the identity card picture set through a preset marking tool in the front-end page is identified, identifying the content marked by the user, wherein the content marked by the user comprises a rectangular frame and a user-defined class value of an image in the rectangular frame;

and generating a mask on each identity card picture based on the content marked by the user, finishing the marking of the identity card picture set, and obtaining the identity card image training set.

In this embodiment, a marking tool is used to generate a corresponding mask, and the marking tool may be a label img (target detection marking tool). The categories include category 1, category 2, category 3, category 4, and category 5. The name field is labeled as type 1, gender as type 2, nationality as type 3, address as type 4, and identification number as type 5. The reason why the year, month and day are not indicated is that the identification number can be obtained. The identity card image training set is generated by labeling the identity card image set, so that the subsequent training of the identity card segmentation model is facilitated.

S2: and inputting the identity card images in the identity card image training set into a preset identity card segmentation model based on the void space convolution pooling pyramid to obtain a first prediction result.

In this embodiment, semantic segmentation is a typical computer vision problem, specifically taking some raw data, such as planar images, as input and converting them into masks with highlighted regions of interest. Currently, a commonly used recognition algorithm is the term full-pixel semantic segmentation (full-pixel semantic segmentation), in which each pixel in an image is assigned a class ID according to the object of interest to which it belongs, which is a process of classifying each pixel as belonging to an object class. It holds great importance in autopilot systems (specifically street view identification and understanding), unmanned aerial vehicle applications (landing site judgment) and wearable device applications. The identity card segmentation model is constructed based on the void space convolution pooling pyramid, and each key information field is segmented from the identity card by combining an identity card image training set.

Specifically, the step of inputting the identification card picture into a preset identification card segmentation model based on a cavity space convolution pooling pyramid, and obtaining a first prediction result includes:

upsampling the output result to obtain an upsampled result;

In this embodiment, the identity card image after the first downsampling is subjected to low-feature acquisition and convolution, and is spliced (ASPP) with the output result of the void space convolution pooling pyramid (ASPP). The void space convolution pooling pyramid (ASPP) performs void convolution for a given input at different sampling frequencies, equivalent to capturing the context of an image at multiple scales. Most of the current methods splice features of multiple downsampling of a backbone network with features of an ASPP output. However, the ratio of the name, the gender and the ethnicity of the three key fields in the identity card is less than 1% in the whole identity image, the three key fields belong to a small target object, and the position information of the small target object is lost more after multiple downsampling, so that the position information is reserved for improving the segmentation effect of the small target object. Wherein, low feature (low feature) refers to some small detail information in the image, such as edge (edge), corner (corner), color (color), pixel (pixels), gradient (gradients), etc., which can be collected by the filter.

In addition, the step of performing down-sampling on the identity card picture for multiple times to obtain a down-sampling result, inputting the down-sampling result into the void space convolution pooling pyramid of the identity card segmentation model to obtain an output result, and the step of performing up-sampling on the output result, wherein specific numerical values in each step of obtaining an up-sampling result are as follows: 2 times of downsampling, namely 16 times of downsampling, the identity card picture is subjected to 4 times, and the downsampling result is input into a cavity space convolution pooling pyramid of the identity card segmentation model to obtain an output result; and performing 8 times of upsampling on the output result to obtain an upsampling result.

Wherein, the step of inputting the downsampling result into the void space convolution pooling pyramid of the identity card segmentation model to obtain an output result comprises:

In this embodiment, two layers of 3 × 5 convolution are added before ASPP 1 × 1 convolution (Conv) to get richer semantic features. The sampling frequency is set according to a frequency parameter (rate), when the rate is 1, the original image does not lose any information samples, the convolution operation is a standard convolution operation, and when the rate >1, for example, 2, every other (rate minus 1) pixel samples are sampled on the original image. The first sampling frequency, the second sampling frequency and the third sampling frequency are all different in the application. Wherein the first sampling frequency is 6, the second sampling frequency is 12, and the third sampling frequency is 16. In the practical application process, the first sampling frequency, the second sampling frequency and the third sampling frequency can be adjusted according to practical requirements. By setting different sampling frequencies, richer image characteristics are obtained.

The step of performing low-feature acquisition and convolution on the identity card picture after the first downsampling to obtain a convolution result comprises the following steps:

In this embodiment, since the present application is directed to the full-field segmentation of the identity card, and the segmentation targets of the identity card are all strip-shaped, a convolution kernel with a size of 3 × 5 is set in the identity card segmentation model, so that the sense field is larger in the direction along the identity card number (i.e., in the lateral direction).

S3: and calculating a loss function based on the identity card image training set and the first prediction result, and iteratively training the identity card segmentation model until a preset convergence condition is reached to obtain the trained identity card segmentation model.

In this embodiment, the preset convergence condition in the present application may be that a preset iteration number T is reached, or other stop conditions may be set according to actual needs, and the present application is applicable. The specific loss function may be Dice loss. The Dice loss is provided aiming at the problem that the foreground proportion is too small, and is suitable for the identity card segmentation scene of the application. The Dice refers to a Dice coefficient, and the Dice coefficient is derived from two classifications, which essentially measure the overlapping part of two samples. The formula is as follows: dice Loss ═ 1-DSC (a, B),

DSC (a, B) ═ 2| a ≠ B |/(| a | + | B |), where a is the first prediction result of the model and B is the content of the user's label in the identification card image training set.

S4: and acquiring an identity card picture to be recognized, inputting the identity card picture to be recognized into the trained identity card segmentation model, acquiring a second prediction result, and acquiring a plurality of label frames based on the second prediction result.

In this embodiment, the picture of the identification card to be recognized is input into the trained identification card segmentation model, and a second prediction result is obtained. The second prediction result output is a mask segmentation map (mask map). The label Box (Bounding Box) can be obtained by an image processing method (Bounding-Box regression).

S5: and screening the plurality of label frames to obtain a target label frame, and intercepting key fields in the identity card picture based on the target label frame.

In this embodiment, the plurality of tag frames are filtered to obtain a target tag frame, that is, a specific position of each key field in the picture is represented. And cutting out corresponding field pictures from the original image according to the positions and sending the field pictures into a character recognition model.

Specifically, the step of screening the plurality of tag frames to obtain a target tag frame, and intercepting the key field in the identity card picture based on the target tag frame includes:

In the content output by the model, as the minority identity card includes more than mandarin and the minority language, the interference is more, and for the same category, the model does not output only one tag frame, the tag frames output by the model need to be screened, and a target tag frame meeting the preset condition is selected, so as to locate the position of the key field to be identified through the target tag frame.

The step of taking the label frame meeting the preset condition as the target label frame of the current category comprises the following steps:

identifying the area of the label box in each category;

and taking the label frame with the largest area as the target label frame of the current category.

In this embodiment, generally, only one tag frame with the largest area is in one category, so that the tag frame with the largest area can be directly used as the target tag frame of the current category, which is convenient for a computer to quickly select the target tag frame.

Correspondingly, the step of taking the label frame meeting the preset condition as the target label frame of the current category comprises the following steps:

identifying the area of the label box in each category;

and taking the label frame with the area exceeding the preset threshold value as a target label frame of the current category.

In this embodiment, a threshold may be set, and all the tag frames with areas exceeding the preset threshold are used as the target tag frames of the current category. And intercepting key fields corresponding to the label boxes larger than the threshold value, sending the key fields into the text recognition model for recognition, judging the content output by the text recognition model by the computer, and determining the final output content.

S6: and inputting the intercepted key fields into a pre-trained text recognition model to obtain a recognition result.

In this embodiment, the text recognition model may be CTPN + CRNN. The CTPN + CRNN is a deep learning OCR character recognition method, the CTPN is used for detecting characters, the CRNN is used for recognizing the characters, and the character recognition method has better expression in character recognition. The CTPN (Detecting Text in Natural Image with connecting Text forwarding Network, Text detection based on Network connecting preselected frames) mainly locates Text lines in the picture. The CRNN (Convolutional Recurrent Neural Network) has fast speed, good performance and small model for recognizing characters.

Specifically, the step of inputting the intercepted key field into a preset trained text recognition model to obtain a recognition result includes:

determining whether there is a duplicate category value;

In this embodiment, the result to be determined that there is no character in the character set is taken as an error result. And outputting all the identification results until the processing of all the results to be judged is finished. For example, the following steps are carried out: a plurality of label frames exist under the same category (such as gender category) in the prediction result output by the identity card segmentation model, and the label frames need to be further screened, so that the target label frame under the category is determined. In a common Han nationality identity card, generally, one label frame exceeding a preset threshold value in the category is used as a target label frame directly. In the minority identity card, the minority identity card comprises mandarin and minority characters. For example, in the category of gender, the minority identity card has a "gender" character printed in mandarin, and above mandarin, the "gender" language printed in minority corresponds to the "gender". If a plurality of label boxes exceeding the preset threshold exist (due to the interference of the characters of the minority, the common mandarin in the category corresponds to one label box, and the characters of the minority in the category also correspond to one label box), the plurality of label boxes exceeding the preset threshold in the category are directly used as target label boxes, corresponding key fields are input into the text recognition model, and finally, the characters output by the text recognition model are judged, so that the target label boxes are used as recognition results. In the input text recognition model, the characters output by the text recognition model are finally judged, so that specific examples of the recognition result are as follows: and under the same category of the minority identity card, after obtaining 2 target label boxes, respectively intercepting corresponding key fields, inputting the key fields into a text recognition model, and taking the corresponding fields which are respectively output at the moment as male and black as results to be judged. It is conceivable that: the field that appears dark is output because the text recognition model is inaccurate for minority word recognition. Since the gender can only be that the characters male or female exist in the gender character set, when the male and the female are recognized, the character set can judge that only the result to be judged with the black field exists in the characters in the character set, and the result to be judged with the black field is taken as the recognition result. And taking the result to be judged with black field as an error result.

It is emphasized that, in order to further ensure the privacy and security of the id card image training set, the id card image training set may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The application can be applied to the field of intelligent government affairs and is used for identifying the resident identification card in the intelligent government affairs, so that the construction of a smart city is promoted.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an identification card recognition apparatus based on an identification card segmentation model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 3, the identification card recognition apparatus 300 based on the identification card segmentation model according to the embodiment includes: a picture marking module 301, a first input module 302, a loss calculation module 303, a second input module 304, a target screening module 305, and a text recognition module 306. Wherein: the picture marking module 301 is configured to receive an identity card picture set, mark the identity card picture set, and generate an identity card image training set; a first input module 302, configured to input the identity card images in the identity card image training set into a preset identity card segmentation model based on a void space convolution pooling pyramid, so as to obtain a first prediction result; a loss calculation module 303, configured to calculate a loss function based on the identification card image training set and the first prediction result, and iteratively train the identification card segmentation model until a preset convergence condition is reached, to obtain a trained identification card segmentation model; a second input module 304, configured to obtain an identity card picture to be recognized, input the identity card picture to be recognized into the trained identity card segmentation model, obtain a second prediction result, and obtain multiple label frames based on the second prediction result; a target screening module 305, configured to screen the plurality of tag frames, obtain a target tag frame, and intercept a key field in the identity card picture based on the target tag frame; and a text recognition module 306, configured to input the intercepted key fields into a pre-trained text recognition model, so as to obtain a recognition result.

In the embodiment, compared with the traditional template matching, the identity card segmentation model has stronger robustness, and particularly has an effect obviously superior to that of the template matching on the disturbed minority identity card. Because the identification is carried out through the identity card segmentation model instead of the matching through the template, the tolerance on light, noise and slight visual angle change is higher, and the positions of the key fields in the identity card picture are positioned through the identity card segmentation model and the target label frame. The character detection of the text recognition model can be better facilitated, and the accuracy of identification card recognition is improved.

The picture labeling module 301 comprises a notification sub-module, an identification sub-module and a generation sub-module. The notification sub-module is used for displaying the identity card picture set in a preset front-end page and notifying a user to mark the identity card picture set; the identification submodule is used for identifying the content of user labels after identifying that the user finishes labeling each identity card picture in the identity card picture set through a preset labeling tool in the front-end page, wherein the content of the user labels comprises a rectangular frame and a user-defined class value of an image in the rectangular frame; and the generation submodule is used for generating masks on all the identity card pictures based on the content marked by the user, finishing the marking of the identity card picture set and obtaining the identity card image training set.

The first input module 302 includes a downsampling sub-module, an upsampling sub-module, an acquisition sub-module, and a stitching sub-module. The down-sampling sub-module is used for carrying out down-sampling on the identity card picture for multiple times through the identity card segmentation model to obtain a down-sampling result, and inputting the down-sampling result into a cavity space convolution pooling pyramid of the identity card segmentation model to obtain an output result; the up-sampling sub-module is used for up-sampling the output result to obtain an up-sampling result; the acquisition submodule is used for acquiring an identity card picture after first down-sampling, and performing low-feature acquisition and convolution on the identity card picture after the first down-sampling to obtain a convolution result; and the splicing submodule is used for splicing the up-sampling result and the convolution result to obtain a splicing result, and performing convolution and up-sampling on the splicing result in sequence to obtain the first prediction result.

The down-sampling sub-module comprises an input unit and a splicing unit. The input unit is used for respectively inputting the downsampling result into a preset convolution layer of a convolution kernel of 1 × 1, a convolution layer of a convolution kernel of 3 × 5 based on the first sampling frequency cavity convolution, a convolution layer of a convolution kernel of 3 × 5 based on the second sampling frequency cavity convolution, a convolution layer of a convolution kernel of 3 × 5 based on the third sampling frequency cavity convolution and a pooling layer to respectively obtain intermediate results; the splicing unit is used for splicing all the intermediate results, sequentially inputting two convolution layers with convolution kernels of 3 multiplied by 5 and convolution layers with convolution kernels of 1 multiplied by 1, and obtaining the output result.

In some optional implementations of this embodiment, the acquisition submodule is further configured to: and carrying out low-feature acquisition on the identity card picture subjected to the first downsampling, and inputting the identity card picture into a convolution layer with a convolution kernel size of 1 multiplied by 1 to obtain a convolution result.

In some optional implementations of this embodiment, the splicing submodule is further configured to: and convolving the splicing result by a convolution kernel of 3 multiplied by 5, and performing double upsampling to obtain the first prediction result.

The target screening module 305 includes an acquisition sub-module, a division sub-module, a determination sub-module, a location sub-module, and an interception sub-module. The obtaining submodule is used for obtaining the category values corresponding to all the label frames in the second prediction result; the classification submodule is used for performing class classification on the label frame based on the class value to obtain a plurality of classes, wherein the class value in each class is the same; the determining submodule is used for taking the label frames meeting the preset conditions as target label frames of the current category in the same category until the target label frames of each category are determined respectively; the position submodule is used for obtaining the position of the key field corresponding to each category based on all the target label boxes; and the intercepting submodule is used for intercepting the key fields in the identity card picture based on the positions of the key fields.

In some optional implementations of this embodiment, the determining sub-module is further configured to: and identifying the area of the label frame in each category, and taking the label frame with the largest area as a target label frame of the current category.

In some optional implementations of this embodiment, the determining sub-module is further configured to: and identifying the area of the label frame in each category, and taking the label frame with the area exceeding a preset threshold value as a target label frame of the current category.

The text recognition module 306 includes a category sub-module, a first judgment sub-module, a first recognition sub-module, a second recognition sub-module, a character set sub-module, a second judgment sub-module, and a result sub-module. The category submodule is used for determining a target label box corresponding to the intercepted key field and a category value carried by the target label box; the first judgment submodule is used for determining whether repeated category values exist; the first recognition submodule is used for inputting the intercepted key field into a preset trained text recognition model when no repeated category value exists, and obtaining a recognition result; the second recognition submodule is used for inputting the intercepted key fields into a preset trained text recognition model when repeated category values exist, and obtaining a plurality of results to be judged; the character set sub-module is used for determining a preset character set corresponding to the result to be judged based on the category value; the second judgment submodule is used for identifying whether the corresponding characters in the character set exist in the result to be judged; and the result submodule is used for taking the result to be judged of the character in the character set as the recognition result.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 200 comprises a memory 201, a processor 202, a network interface 203 communicatively connected to each other via a system bus. It is noted that only computer device 200 having

components

201 and 203 is shown, but it is understood that not all of the illustrated components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 201 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 200. Of course, the memory 201 may also include both internal and external storage devices of the computer device 200. In this embodiment, the memory 201 is generally used for storing an operating system installed in the computer device 200 and various types of application software, such as computer readable instructions of an identification card recognition method based on an identification card segmentation model. Further, the memory 201 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 202 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 202 is generally operative to control overall operation of the computer device 200. In this embodiment, the processor 202 is configured to execute computer readable instructions stored in the memory 201 or process data, for example, execute computer readable instructions of the identification card identification method based on the identification card segmentation model.

The network interface 203 may comprise a wireless network interface or a wired network interface, and the network interface 203 is generally used for establishing communication connection between the computer device 200 and other electronic devices.

In the embodiment, the identity card segmentation model is set according to the attribute characteristics of the identity card in a targeted manner, so that the segmentation precision of the model is improved. And the accuracy of identification of the identity card is effectively improved by combining a text identification model.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the identification card identification method based on the identification card segmentation model as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. An identity card identification method based on an identity card segmentation model is characterized by comprising the following steps:

2. The identity card identification method based on the identity card segmentation model of claim 1, wherein the step of inputting the identity card picture into the preset identity card segmentation model based on the void space convolution pooling pyramid to obtain the first prediction result comprises:

upsampling the output result to obtain an upsampled result;

3. The identification card recognition method based on the identification card segmentation model of claim 2, wherein the step of inputting the downsampling result into the void space convolution pooling pyramid of the identification card segmentation model to obtain the output result comprises:

4. The identification card recognition method based on the identification card segmentation model as claimed in claim 2, wherein the step of performing low-feature acquisition and convolution on the identification card image after the first downsampling to obtain a convolution result comprises:

5. The identity card recognition method based on the identity card segmentation model according to claim 1, wherein the second prediction result includes a category value, the step of filtering the plurality of tag frames to obtain a target tag frame, and intercepting the key field in the identity card picture based on the target tag frame includes:

6. The identification card recognition method based on the identification card segmentation model as claimed in claim 1, wherein the step of inputting the intercepted key fields into a preset trained text recognition model to obtain a recognition result comprises:

determining whether there is a duplicate category value;

7. The identification card recognition method based on the identification card segmentation model as claimed in claim 1, wherein the step of labeling the identification card image set to obtain the identification card image training set comprises:

8. The utility model provides an ID card recognition device based on model is cut apart to ID card which characterized in that includes:

9. A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions, the processor when executing the computer-readable instructions implementing the steps of the identification card segmentation model-based identification card recognition method according to any one of claims 1 to 7.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores thereon computer-readable instructions, which when executed by a processor, implement the steps of the identification card identification method based on the identification card segmentation model according to any one of claims 1 to 7.