CN111666932A

CN111666932A - Document auditing method and device, computer equipment and storage medium

Info

Publication number: CN111666932A
Application number: CN202010461277.XA
Authority: CN
Inventors: 唐子豪; 刘莉红; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2020-09-15
Anticipated expiration: 2040-05-27
Also published as: CN111666932B

Abstract

The invention discloses a bill auditing method, a bill auditing device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a document image to be audited associated with the document audit instruction; inputting a document image to be audited into a character string area recognition model, and acquiring at least three character string area images through a YOLO algorithm; inputting each character string area image into a trained lightweight character recognition model, and extracting character features of the character string area images to obtain recognition results; judging whether the document information is the same as the document number to obtain a first judgment result, judging whether the identity card information is the same as the identity card number to obtain a second judgment result, and judging whether the bank card information is the same as the bank card number to obtain a third judgment result; and if the first judgment result, the second judgment result and the third judgment result are the same, determining that the audit is passed. The invention realizes the automatic identification of the character string in the document and the examination of the character string, thereby achieving the effect of automatically examining and examining the document.

Description

Document auditing method and device, computer equipment and storage medium

Technical Field

The invention relates to the field of data processing, in particular to a document auditing method and device, computer equipment and a storage medium.

Background

With the development of science and technology, more and more scenes are needed to be checked for various document images and document images, for example, in many scenes, images of filled documents of various personnel need to be checked and verified to determine whether information in the images is wrong or not, the workload is extremely high, handwritten characters often appear in the filled documents of various personnel, and at present, the identification of the handwritten characters needs manual identification, and a certain identification error rate exists; in the prior art, a document image is also recognized through an OCR (optical character Recognition) technology, and since the existing OCR technology recognizes characters one by one, the response time of the Recognition technology to the characters (especially handwritten characters) is long, the program capacity and the load are large, the operation of a server is increased, the Recognition efficiency has a bottleneck and the server can operate in an overload manner, the auditing efficiency is finally influenced, and the customer satisfaction is poor.

Disclosure of Invention

The invention provides a document auditing method, a document auditing device, computer equipment and a storage medium, which realize automatic identification of a character string in a document and auditing of the character string, achieve the effect of automatic document auditing, greatly reduce the capacity of a model and simplify the structure of the model, can be applied to mobile equipment, improve the identification accuracy and reliability, and improve the satisfaction degree of customers.

A document auditing method comprises the following steps:

after receiving a document auditing instruction, acquiring a document image to be audited associated with the document auditing instruction; the bill auditing instruction comprises a bill number, an identity card number of the object and a bank card number of the object;

inputting the document image to be audited into a character string area identification model, and acquiring at least three character string area images in the document image to be audited, which are identified by the character string area identification model, through a YOLO algorithm;

inputting each character string area image into a trained lightweight character recognition model, wherein the lightweight character recognition model performs character feature extraction on the character string area image to obtain a recognition result output by the lightweight character recognition model according to the extracted character features; the recognition result comprises the character string type and the character string information of the character string area image; the character string types comprise bills, identification cards and bank cards; the character string information comprises bill information corresponding to the bill class, identity card information corresponding to the identity card class and bank card information corresponding to the bank card class; the light-weight character recognition model is a neural network model based on a ShuffleNet model;

judging whether the document information is the same as the document number to obtain a first judgment result, judging whether the identity card information is the same as the identity card number to obtain a second judgment result, and judging whether the bank card information is the same as the bank card number to obtain a third judgment result;

and if the first judgment result, the second judgment result and the third judgment result are the same, determining that the document image to be audited passes the audit.

A document auditing apparatus, comprising:

the receiving module is used for acquiring a document image to be audited related to the document audit instruction after receiving the document audit instruction; the bill auditing instruction comprises a bill number, an identity card number of the object and a bank card number of the object;

the acquisition module is used for inputting the document image to be audited into a character string area identification model and acquiring at least three character string area images in the document image to be audited, which are identified by the character string area identification model, through a YOLO algorithm;

the recognition module is used for inputting each character string area image into a trained lightweight character recognition model, and the lightweight character recognition model is used for extracting character features of the character string area image and acquiring a recognition result output by the lightweight character recognition model according to the extracted character features; the recognition result comprises the character string type and the character string information of the character string area image; the character string types comprise bills, identification cards and bank cards; the character string information comprises bill information corresponding to the bill class, identity card information corresponding to the identity card class and bank card information corresponding to the bank card class; the light-weight character recognition model is a neural network model based on a ShuffleNet model;

the judging module is used for judging whether the document information is the same as the document number to obtain a first judging result, judging whether the identity card information is the same as the identity card number to obtain a second judging result, and judging whether the bank card information is the same as the bank card number to obtain a third judging result;

and the determining module is used for determining that the document image to be audited passes the audit if the first judging result, the second judging result and the third judging result are the same.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the document auditing method when executing the computer program.

A computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, carries out the steps of the above-mentioned document auditing method.

According to the document auditing method, the document auditing device, the computer equipment and the storage medium, at least three character string area images (including a printed character string area image and a handwritten character string area image) in a document image to be audited are identified through a YOLO algorithm, the document image to be audited is input into a lightweight character identification model based on a ShuffleNet model for character feature extraction, an identification result is obtained, and whether the audit is passed or not is determined according to the identification result.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of a document auditing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a document review method in an embodiment of the invention;

FIG. 3 is a flow chart of a document review method in another embodiment of the invention;

FIG. 4 is a flowchart of step S20 of a document review method according to an embodiment of the present invention;

FIG. 5 is a flowchart of step S30 of a document review method according to an embodiment of the present invention;

FIG. 6 is a flowchart of step S302 of a document auditing method according to an embodiment of the present invention;

FIG. 7 is a flowchart of step S30201 of a document review method according to an embodiment of the present invention;

FIG. 8 is a flowchart of step S30 of a document review method according to another embodiment of the present invention;

FIG. 9 is a functional block diagram of a document auditing apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a computer device in an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The document auditing method provided by the invention can be applied to the application environment shown in figure 1, wherein a client (computer equipment) is communicated with a server through a network. The client (computer device) includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, cameras, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

In an embodiment, as shown in fig. 2, a document auditing method is provided, which mainly includes the following steps S10-S50:

s10, after receiving a bill auditing instruction, acquiring a to-be-audited bill image associated with the bill auditing instruction; the bill auditing instruction comprises a bill number, an identity card number of the object and a bank card number of the object.

Understandably, after a worker fills in a document and shoots an image of the filled document, submitting the shot image of the document to be audited for auditing, and triggering a document auditing instruction, wherein the document auditing instruction is associated with the image of the document to be audited, the document auditing instruction comprises a document number, an identity number of an object and a bank card number of the object, and the image of the document to be audited comprises the document number, the identity number of the object and filling contents (including printing contents and handwriting contents) of the bank card number of the object, such as: when a user applies for a loan, a worker needs to fill in a loan contract number, an identity card number of a borrower and a bank card number of the borrower; when applying for a vehicle insurance claim document for a customer, a worker needs to fill in a claim document number, a customer identity card number, a customer bank card number and the like.

The bill number is of a preset fixed length, the identification number of the object is a 18-bit character string formed by numbers and letters, and the bank card number of the object is a 16-bit character string formed by numbers.

S20, inputting the document image to be audited into a character string area recognition model, and obtaining at least three character string area images in the document image to be audited, which are recognized by the character string area recognition model, through a YOLO algorithm.

Understandably, the YOLO (young Only Look once) algorithm is an algorithm for performing grid recognition after performing grid segmentation on an input image and monitoring an area of an object to be recognized, the character string area recognition model is a trained neural network model based on YOLO, the character string area recognition model extracts character string features from the document image to be checked, the character string features are image features of character strings consisting of continuous numbers or letters, the character string area recognition model recognizes at least three character string area images in the document image to be checked according to the character string features, the character string area images comprise document number images containing document number contents, identity card number images containing identity card number contents and bank card number images containing bank card number contents, and the character string area images comprise printed character string area images and handwritten character string area images, the print character string area image is an image of an area range containing a printed character string, and the handwritten character string area image is an image of an area range containing a handwritten character string.

In an embodiment, as shown in fig. 4, in step S20, that is, the obtaining, by using the YOLO algorithm, a plurality of character string area images in the document image to be reviewed, which are identified by the character string area identification model, includes:

s201, the preprocessing model in the character string area recognition model carries out gray processing on the document image to be checked to obtain a gray image.

Understandably, the character string area identification model comprises the preprocessing model, the preprocessing model carries out gray processing on the document image to be checked, the gray processing is to carry out gray processing on each pixel point of the document image to be checked, and the gray image is the image of the document image to be checked after the gray processing, so that the character string effect of the document image to be checked can be more obvious through the gray processing, and the identification is more reliable.

S202, inputting the gray image into a YOLO identification model in the character string area identification model.

Understandably, the character string region identification model further includes the YOLO identification model, the YOLO identification model is a neural network model based on a YOLO algorithm, and a network structure of the YOLO identification model may be set according to requirements, for example, the network structure of the YOLO identification model is a YOLO v1, YOLO v2, YOLO v3 network structure, and the like, and preferably, the network structure of the YOLO identification model is a YOLO v3 network structure, so that the region identified by the YOLO identification model is more accurate and efficient.

S203, through a YOLO algorithm, the YOLO recognition model extracts character string features in the gray-scale image to obtain an identification area containing character strings in the gray-scale image.

Understandably, the character string features are image features of character strings composed of continuous numbers or letters, the character strings in the gray-scale image are recognized through a YOLO algorithm, and areas recognized in the gray-scale image are marked as identification areas.

S204, intercepting the identification area as the character string area image.

Understandably, the identification area recognized in the gray level image is intercepted, and the intercepted identification area is determined as the character string area image.

According to the method, gray processing is carried out on the document image to be checked through a preprocessing model to obtain a gray image, and a YOLO identification model in the character string area identification model is input; extracting character string features in the gray level image through a YOLO algorithm to obtain an identification area containing character strings; and the character string area image is captured, so that the character string area image in the document image to be audited can be rapidly and accurately identified through gray processing and a YOLO algorithm, and the identification accuracy and reliability are improved.

S30, inputting each character string area image into a trained lightweight character recognition model, wherein the lightweight character recognition model extracts character features of the character string area images and acquires recognition results output by the lightweight character recognition model according to the extracted character features; the recognition result comprises the character string type and the character string information of the character string area image; the character string types comprise bills, identification cards and bank cards; the character string information comprises bill information corresponding to the bill class, identity card information corresponding to the identity card class and bank card information corresponding to the bank card class; the light-weight character recognition model is a neural network model based on a ShuffleNet model.

Understandably, the character features are the printed and handwritten image features of letters and numbers, the character features also include image length features, because the size of a printed character string area image corresponding to a character string is close to that of a corresponding handwritten character string area image and meets a preset tolerance range, the tolerance range can be set according to requirements, a trained lightweight character recognition model is obtained by training the lightweight character recognition model by using a sample containing character strings (including identification number, bank card number, receipt number and the like) meeting the tolerance range, the lightweight character recognition model extracts the character features of each character string area image (including the receipt number image, the identification number image and the bank card number image), predicts and outputs the recognition result according to the character features, therefore, the character recognition model can rapidly recognize the printed character string and the handwritten character string, does not need to perform the operation of character recognition one by one after character splitting one by one, and does not need to distinguish and recognize the printing and handwriting, thereby greatly simplifying the network structure of the neural network and greatly reducing the capacity of the model; the recognition result comprises the character string type and the character string information of the character string area image; the character string types comprise bills, identification cards and bank cards; the bill type is a type containing a bill number, the identification card type is a type containing an identification card number, the bank card type is a type containing a bank card number, the character string information comprises bill information corresponding to the bill type, identification card information corresponding to the identification card type and bank card information corresponding to the bank card type, the bill information is information of an identified bill number, the identification card information is information of an identified identification card number, and the bank card information is information of an identified bank card number.

The light-weight character recognition model is a neural network model based on a ShuffleNet model, the ShuffleNet model is a model containing a basic ResNet light-weight structure, and the ShuffleNet model is a light-weight neural network model, so that the calculation amount of the model can be greatly reduced while the precision is kept, the capacity of the neural network is greatly reduced, the capacity and the operation load of a server can be reduced by applying the light-weight character recognition model to mobile equipment (namely, the light-weight character recognition model is operated on the mobile equipment, and the result after the operation is provided to the server for processing); the network structure of the shuffle network model may be set according to a requirement, for example, the network structure of the shuffle network model may be a shuffle V1 network structure or a shuffle V2 network structure, and the network structure of the shuffle network model may also be a Channel Split (Channel Split) structure at first, two branches are separated, one branch path is set to input a 1 × 1 convolution, connected to a 3 × 3 deep convolution (DWConv, depthwise convolution) (mainly to reduce a calculation amount), connected to another 1 × 1 convolution, and finally connected to a short circuit, and the output result of the another 1 × 1 convolution is coupled to another branch, and finally shuffled by a Channel (i.e., a network structure that randomly shuffles a Channel order) to output a final result.

In one embodiment, as shown in fig. 5, in step S30, the character feature extraction of the character string area image by the lightweight character recognition model, and acquiring the recognition result output by the lightweight character recognition model according to the extracted character feature includes:

s301, inputting the character string area image into a first convolution layer in the lightweight character recognition model, and compressing and dimension-increasing the character string area image by the first convolution layer to obtain a first character feature map.

Understandably, the lightweight character recognition model includes a first convolutional layer, a second convolutional layer, a pooling layer, a fully-connected layer, and an output layer, where the first convolutional layer may be set according to requirements, for example, the first convolutional layer may be set as a single convolutional layer (3 × 3 × 24 convolutional kernel, step size is 2), or may be set as a maximal pooling layer (3 × 3 pooling) connected after the first convolutional layer (3 × 3 convolutional kernel), and the first convolutional layer compresses and increases the character string region image by synthesizing the characteristics of the character, that is, the first convolutional layer is preferably a single convolutional layer, and the first convolutional layer compresses and increases the size of the character string region image, that is, the character string region image is processed by the 3 × 3 × 24 convolutional kernel, so that the size of the character string region image can be reduced and the dimension (usually 3 dimensions) of the character string region image can be increased to 24 dimensions, thereby generating the first character feature map which is a multi-dimensional numerical matrix array.

S302, inputting the first character feature map into a second convolution layer in the lightweight character recognition model, wherein the second convolution layer performs character feature extraction on the first character feature map to obtain a second character feature map; the second convolutional layer is a convolutional layer based on a down-sampling model and a ShuffleNet model.

Understandably, the second convolution layer comprises a plurality of down-sampling models with different channel parameters and a plurality of ShuffleNet models with different convolution kernel parameters, the down-sampling models are used for reducing the size of the feature map, adding channels (also being dimensions in the whole text) at the same time, and extracting wider features, namely improving the fault tolerance of the whole model, and the ShuffleNet model has the maximum function of feature extraction and can more accurately capture the features; the character features are image features of printing and handwriting of letters and numbers, the character features further comprise image length features, and the second character feature graph is a matrix array obtained by extracting the character features of the first character feature graph.

In an embodiment, as shown in fig. 6, the step S302 of inputting the first character feature map into a second convolution layer in the lightweight character recognition model, where the second convolution layer performs character feature extraction on the first character feature map to obtain a second character feature map includes:

s30201, inputting the first character feature map into a first fusion model in the second convolutional layer, and performing downsampling processing and feature extraction processing on the first character feature map by using the first fusion model to obtain a first fusion feature map; the first fusion model includes a first downsampling model and a first ShuffleNet model.

Understandably, the second convolutional layer includes a first fusion model, a second fusion model and a third fusion model, the first fusion model, the second fusion model and the third fusion model are all models constructed based on a downsampling model and the shefflenet model, the first fusion model includes the first downsampling model and the first shefflenet model, the first downsampling model is a downsampling model containing a first sampling parameter, the downsampling model is a preset convolutional neural network model, the downsampling model includes a general module and an enhancing module, the general module and the enhancing module respectively process the input feature map to increase the number of channels of the feature map and highlight the character features, and the network structure and parameters of the downsampling model are determined as first channel parameters, the first channel parameters can be set according to requirements, the first channel parameters include all parameters of the general module and the enhancement module, the general module can include a convolution kernel of 3 × 3 and a convolution kernel of 1 × 1, the enhancement module is to add a convolution kernel of 1 × 1 before the general module, the first down-sampling model performs down-sampling processing on the first character feature map, the down-sampling is also called sub-sampling or down-sampling, and refers to extracting feature values in the feature map and reducing channels (i.e. dimensions) of the feature map through convolution, so as to effectively reduce the channels and prevent an over-fitting phenomenon.

The first shuffle net model is a neural network model based on the shuffle net model, the first shuffle net model receives an output result of the first downsampling model and performs feature extraction processing on the output result, the feature extraction is to extract character features, and a network structure of the first shuffle net model can be set according to requirements, for example, the network structure of the first shuffle net model can be a network structure of shuffle net V1 or shuffle net V2, and preferably, the network structure of the first shuffle net model is a network structure formed by splicing network structures of three shuffle net V2.

In one embodiment, as shown in fig. 7, in step S30201, the inputting the first character feature map into a first fusion model in the second convolutional layer, where the first fusion model performs downsampling and feature extraction on the first character feature map to obtain a first fusion feature map includes:

s302011, inputting the first character feature map into a general module in the first down-sampling model, and inputting the first character feature map into an enhancement module in the first down-sampling model.

Understandably, the general module comprises a convolution kernel of 3 × 3, convolution with step size of 2 and a convolution kernel of 1 × 1, convolution with step size of 1, the enhancement module comprises a convolution kernel of 1 × 1, convolution with step size of 1, convolution of 3 × 3, convolution with step size of 2 and convolution with step size of 1 × 1, that is, the enhancement module adds a convolution kernel of 1 × 1 and convolution with step size of 1 before the general module, and the general module and the enhancement module respectively process the first character feature map.

And S302012, the general module performs reduced feature map processing on the first character feature map to obtain a general matrix, and the enhancement module performs reduced feature map processing and enhancement processing on the first character feature map to obtain an enhancement matrix.

Understandably, the processing of the reduced feature map is to perform convolution processing on the first character feature map, reduce the size of the first character feature map, the general matrix is a matrix array obtained after the first character feature map is subjected to the reduced feature map processing by the general module, the enhancement processing is to perform convolution processing on the first character feature map through a convolution kernel of 1 × 1 and convolution with a step length of 1, highlight the character features, and the enhancement matrix is a matrix array obtained after the first character feature map is subjected to the reduced feature map processing and the enhancement processing by the enhancement module.

And S302013, fusing the general matrix and the enhancement matrix to obtain a fusion matrix.

Understandably, the fusion is a merge between channels, such as: and fusing the characteristic diagram of one 2 channel with the characteristic diagram of one 2 channel to obtain a characteristic diagram of one 4 channel, and merging the channels of the general matrix and the enhancement matrix to obtain the fusion matrix, wherein the number of the channels of the fusion matrix is the sum of the number of the channels of the channel matrix and the number of the channels of the enhancement matrix.

And S302014, performing channel extraction processing on the fusion matrix according to a preset first extraction parameter to obtain a first characteristic diagram.

Understandably, the first extraction parameter is a number extracted randomly from all channel numbers, the fusion matrix is extracted according to the first extraction parameter to obtain the first feature map, the channel extraction processing refers to an operation of extracting all channel numbers randomly, and the first feature map is a matrix array after channel extraction is performed on the fusion matrix.

S302015, inputting the first feature map into the first ShuffLeNet model, and performing character feature extraction on the first feature map by the first ShuffLeNet model to obtain the first fusion feature map.

Understandably, the first shuffle net model is a neural network model formed by shuffle net models of three different convolution kernels, the network structure of the first shuffle net model is a network structure formed by splicing the network structures of three shuffle net V2, and the first shuffle net model extracts the character features of the first feature map to obtain the first fusion feature map.

Inputting the first character feature map into a first fusion model in the second convolutional layer, wherein the first fusion model performs downsampling processing and feature extraction processing on the first character feature map to obtain a first fusion feature map; inputting the first fusion feature map into a second fusion model in the second convolutional layer, and performing downsampling processing and feature extraction processing on the first fusion feature map by using the second fusion model to obtain a second fusion feature map; and inputting the second fusion feature map into a third fusion model in the second convolutional layer, and performing down-sampling processing and feature extraction processing on the second fusion feature map by using the third fusion model to obtain a second character feature map.

S30202, inputting the first fused feature map into a second fused model in the second convolutional layer, where the second fused model performs downsampling and feature extraction on the first fused feature map to obtain a second fused feature map; the second fusion model includes a second down-sampled model and a second ShuffleNet model.

Understandably, the second fusion model comprises the second downsampled model and the second ShuffleNet model, the second down-sampling model is a down-sampling model having a second sampling parameter, which may be the same as the first sampling parameter or different from the first sampling parameter, the second downsampling model downsamples the first fusion characteristic map, the second ShuffleNet model is a neural network model based on the ShuffleNet model, the second of the ShuffLeNet models receives the output of the second downsampling model, and the feature extraction processing is carried out on the second ShuffleNet model, the network structure of the second ShuffleNet model can be set according to the requirement, for example, the network structure of the second ShuffleNet model can be the network structure of ShuffleNet V1 or ShuffleNet V2, as the optimization, the network structure of the second ShuffLeNet model is a network structure spliced by the network structures of seven ShuffLeNet V2.

S30203, inputting the second fused feature map into a third fused model in the second convolutional layer, where the third fused model performs downsampling and feature extraction on the second fused feature map to obtain a second character feature map; the third fusion model includes a third downsampling model and a third ShuffleNet model.

Understandably, the third fusion model includes the third downsampling model and the third shefflenet model, the third downsampling model is a downsampling model including a third sampling parameter, the third sampling parameter may be the same as the first sampling parameter or the second sampling parameter, or may be different from the first sampling parameter or the second sampling parameter, the third downsampling model performs downsampling on the second fusion feature map, the third shefflenet model is a neural network model based on a shefflenet model, the third shefflenet model receives an output result of the third downsampling model and performs feature extraction processing on the output result, a network structure of the third shefflenet model may be set according to a requirement, for example, a network structure of the third shefflenet model may be a network structure of shefflenet V1 or shefflenet V2, preferably, the network structure of the third shuffle net model is a network structure formed by splicing network structures of three shuffle nets V2.

According to the method, the size of the feature graph is reduced through the down-sampling models with different channel parameters, the channels (namely dimensions) are added at the same time, wider features are extracted, the fault tolerance of the whole model is improved, the feature extraction is carried out through the ShuffleNet models with different convolution kernel parameters, the features can be captured more accurately, and the identification accuracy and reliability are improved.

And S303, inputting the second character feature map into a pooling layer in the lightweight character recognition model, wherein the pooling layer performs pooling on the second character feature map to obtain a third character feature map.

Understandably, the method of pooling may be set according to requirements, for example, the pooling may be average pooling, maximum pooling, or the like, the pooling is used for performing dimension reduction processing on the second character feature map, and the third character feature map is a one-dimensional matrix array.

S304, inputting the third character feature map into a full connection layer in the lightweight character recognition model, and performing feature connection on the third character feature map by the full connection layer to obtain a connection matrix.

Understandably, the feature connection is to map the obtained feature vector values to positions of a sample mark space, perform weighting summary, connect the feature vectors, perform feature connection on the third character feature map through the full connection layer, and obtain the connection matrix, where the connection matrix is a sorted one-dimensional matrix array. For example: after convolution with 160 3-dimensional convolution kernels of 1 × 1 × 1, the kernels are connected into a one-dimensional 160 vector group.

S305, inputting the connection matrix into an output layer in the lightweight character recognition model, and performing prediction classification processing on the connection matrix by the output layer to obtain a recognition result.

Understandably, the output layer performs prediction classification processing on the connection matrix, wherein the prediction classification processing is to classify and predict numerical values in the connection matrix, namely softmax processing, and finally obtain an identification result.

Inputting the character string area image into a first convolution layer, compressing and performing dimension increasing processing on the character string area image to obtain a first character feature map, inputting a second convolution layer, and performing character feature extraction on the first character feature map to obtain a second character feature map; the second convolutional layer is a convolutional layer based on a down-sampling model and a ShuffleNet model; inputting the second character feature map into a pooling layer, and pooling the second character feature map to obtain a third character feature map; inputting the third character feature diagram into a full-connection layer, and performing feature connection on the third character feature diagram to obtain a connection matrix; the connection matrix is input into the output layer, and the output layer carries out prediction and classification processing on the connection matrix to obtain a recognition result, so that the character string in a character string area image (including a printed character string area image and a handwritten character string area image) can be quickly and accurately recognized according to the character features by extracting the character features without the operation of character recognition one by one after character splitting one by one and the distinguishing and recognition processing of printing and handwriting, the network structure of a neural network is greatly simplified, the capacity of a model is greatly reduced, and the method is easy to apply to mobile equipment.

In an embodiment, as shown in fig. 8, before the step S30, that is, before the step S30 of inputting each of the character string region images into the lightweight character recognition model based on the shefflenet model, the method includes:

s306, acquiring a character string image sample set; the character string image sample set contains a plurality of character string image samples, the character string image samples are associated with a character string label, and the character string image samples comprise print samples containing printed characters and handwritten samples containing handwritten characters.

Understandably, the character string image sample set includes at least one character string image sample, each character string image sample is associated with one character string label, the character string image sample includes the print sample and the handwriting sample, the character string label is a character string included in the corresponding character string image sample, the print sample is an image with printed characters, and the handwriting sample is an image of characters manually handwritten.

S307, inputting the character string image sample into a deep convolution neural network model containing initial parameters.

Understandably, the initial parameter may be set according to requirements, for example, the initial parameter may be a random parameter, or a parameter that is migrated and learned through a neural network model of the same network structure.

And S308, the deep convolutional neural network model extracts the character features of the character string image sample, and a sample result output by the deep convolutional neural network model according to the extracted character features is obtained.

Understandably, the sample result is a result of the deep convolutional neural network model which is identified and output according to the character features extracted from the character string image sample.

S309, confirming a loss value according to the sample result of the character string image sample and the character string label of the character string image sample.

Understandably, the sample result of the character string image sample is compared with the character string label to determine a loss value corresponding to the sample result, namely, the loss value is calculated through a loss function of the deep convolutional neural network model.

S3010, when the loss value reaches a preset convergence condition, recording the converged deep convolutional neural network model as a trained lightweight character recognition model.

Understandably, the convergence condition may be a condition that the loss value is smaller than a set threshold, that is, when the loss value is smaller than the set threshold, the deep convolutional neural network model after convergence is recorded as a lightweight character recognition model after training is completed.

S3011, when the loss value does not reach a preset convergence condition, iteratively updating initial parameters of the deep convolutional neural network model until the loss value reaches the preset convergence condition, and recording the converged deep convolutional neural network model as a trained lightweight character recognition model.

In this way, the convergence condition may be a condition that the loss value is small and does not decrease again after 900 times of calculation, that is, when the loss value is small and does not decrease again after 900 times of calculation, the training is stopped, and the converged deep convolutional neural network model is recorded as the trained lightweight character recognition model.

Therefore, when the loss value does not reach the preset convergence condition, the initial parameters of the deep convolutional neural network model are continuously updated and iterated, the initial parameters can be continuously drawn close to the accurate recognition result, and the accuracy of the recognition result is higher and higher.

S40, judging whether the document information is the same as the document number to obtain a first judgment result, judging whether the identity card information is the same as the identity card number to obtain a second judgment result, and judging whether the bank card information is the same as the bank card number to obtain a third judgment result.

Understandably, comparing the bill information with the bill number, and determining the first judgment result according to the result of comparing the bill information with the bill number, wherein the first judgment result comprises the same result and the different result, namely when the bill information is the same as the bill number, the first judgment result is the same; comparing the identity card information with the identity card number, and determining a second judgment result according to the result of the comparison between the identity card information and the identity card number, wherein the second judgment result comprises the same result and the different result, namely the second judgment result is the same when the identity card information is the same as the identity card number; and comparing the bank card information with the bank card number, and determining a third judgment result according to the comparison result of the bank card information and the bank card number, wherein the third judgment result comprises the same and different results, namely the third judgment result is the same when the bank card information is the same as the bank card number.

And S50, if the first judgment result, the second judgment result and the third judgment result are the same, determining that the document image to be audited passes the audit.

Understandably, when the first determination result, the second determination result and the third determination result are the same, determining that the document image to be audited is approved, marking the document image to be audited as approved, and sending the marked document image to be audited to a server.

The method realizes that at least three character string area images (including a printing character string area image and a handwritten character string area image) in the document image to be audited are identified through a YOLO algorithm, the document image to be audited is input into a lightweight character identification model based on a ShuffleNet model for character feature extraction, an identification result is obtained, whether the audit is passed or not is determined according to the identification result, the character string in the document can be automatically identified and the character string can be audited, the effect of automatically auditing the document is achieved, the capacity of the model is greatly reduced, the structure of the model is simplified, the method can be applied to mobile equipment, the identification accuracy and reliability are improved, and the satisfaction degree of customers is improved.

In an embodiment, as shown in fig. 3, after the step S40, that is, after the determining whether the bank card information is the same as the bank card number to obtain a third determination result, the method further includes:

and S60, if at least one of the first judgment result, the second judgment result and the third judgment result is different, determining that the document image to be audited is not approved.

Understandably, as long as one of the first determination result, the second determination result and the third determination result has different results, the document image to be audited is determined to be not approved, and the document image to be audited is not qualified.

According to the method and the device, the document image to be audited is determined to be not approved through at least one of the first discrimination result, the second discrimination result and the third discrimination result being different, and therefore the identification efficiency is improved. In an embodiment, a document auditing device is provided, and the document auditing device corresponds to the document auditing methods in the embodiments one to one. As shown in fig. 9, the document auditing apparatus includes a receiving module 11, an obtaining module 12, an identifying module 13, a judging module 14 and a confirming module 15.

The functional modules are explained in detail as follows:

the receiving module 11 is configured to, after receiving a document auditing instruction, obtain a document image to be audited that is associated with the document auditing instruction; the bill auditing instruction comprises a bill number, an identity card number of the object and a bank card number of the object;

the obtaining module 12 is configured to input the document image to be audited into a character string area identification model, and obtain, through a YOLO algorithm, at least three character string area images in the document image to be audited, which are identified by the character string area identification model;

a recognition module 13, configured to input each of the character string region images into a trained lightweight character recognition model, where the lightweight character recognition model performs character feature extraction on the character string region images, and obtains a recognition result output by the lightweight character recognition model according to the extracted character features; the recognition result comprises the character string type and the character string information of the character string area image; the character string types comprise bills, identification cards and bank cards; the character string information comprises bill information corresponding to the bill class, identity card information corresponding to the identity card class and bank card information corresponding to the bank card class; the light-weight character recognition model is a neural network model based on a ShuffleNet model;

the judging module 14 is configured to judge whether the document information is the same as the document number to obtain a first judgment result, judge whether the identification card information is the same as the identification card number to obtain a second judgment result, and judge whether the bank card information is the same as the bank card number to obtain a third judgment result;

and the determining module 15 is configured to determine that the document image to be audited passes the audit if the first determination result, the second determination result, and the third determination result are the same.

For the specific definition of the document auditing device, reference may be made to the above definition of the document auditing method, which is not described herein again. All or part of the modules in the document auditing device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a document auditing method.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the document auditing method in the above embodiments is implemented.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the document auditing method of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A document auditing method is characterized by comprising the following steps:

2. The document auditing method of claim 1, where after determining whether the bank card information is the same as the bank card number and obtaining a third determination result, further comprising:

and if at least one of the first judgment result, the second judgment result and the third judgment result is different, determining that the document image to be audited is not approved.

3. The document review method according to claim 1, wherein the obtaining, by a YOLO algorithm, a plurality of character string region images in the document image to be reviewed, which are identified by the character string region identification model, comprises:

a preprocessing model in the character string area identification model performs gray processing on the document image to be checked to obtain a gray image;

inputting the gray-scale image into a YOLO identification model in the character string area identification model;

extracting the character string features in the gray level image by using a YOLO (Linear object oriented error) recognition model through a YOLO algorithm to obtain an identification area containing the character string in the gray level image;

and intercepting the identification area as the character string area image.

4. The document auditing method of claim 1, wherein the lightweight character recognition model performs character feature extraction on the character string area image to obtain a recognition result output by the lightweight character recognition model according to the extracted character features, and the method comprises the following steps:

inputting the character string area image into a first convolution layer in the lightweight character recognition model, and compressing and dimension-increasing the character string area image by the first convolution layer to obtain a first character feature map;

inputting the first character feature map into a second convolution layer in the lightweight character recognition model, and performing character feature extraction on the first character feature map by the second convolution layer to obtain a second character feature map; the second convolutional layer is a convolutional layer based on a down-sampling model and a ShuffleNet model;

inputting the second character feature map into a pooling layer in the lightweight character recognition model, wherein the pooling layer is used for pooling the second character feature map to obtain a third character feature map;

inputting the third character feature map into a full connection layer in the lightweight character recognition model, wherein the full connection layer performs feature connection on the third character feature map to obtain a connection matrix;

and inputting the connection matrix into an output layer in the lightweight character recognition model, and performing prediction classification processing on the connection matrix by the output layer to obtain a recognition result.

5. The document auditing method of claim 4, wherein the second convolutional layer performs character feature extraction on the first character feature map to obtain a second character feature map, and the method comprises the following steps:

inputting the first character feature map into a first fusion model in the second convolutional layer, wherein the first fusion model performs downsampling processing and feature extraction processing on the first character feature map to obtain a first fusion feature map; the first fusion model comprises a first downsampling model and a first ShuffleNet model;

inputting the first fusion feature map into a second fusion model in the second convolutional layer, and performing downsampling processing and feature extraction processing on the first fusion feature map by using the second fusion model to obtain a second fusion feature map; the second fusion model comprises a second downsampling model and a second ShuffleNet model;

inputting the second fusion feature map into a third fusion model in the second convolutional layer, wherein the third fusion model performs downsampling processing and feature extraction processing on the second fusion feature map to obtain a second character feature map; the third fusion model includes a third downsampling model and a third ShuffleNet model.

6. The document auditing method of claim 5, wherein the first fusion model performs downsampling processing and feature extraction processing on the first character feature map to obtain a first fusion feature map, and comprises:

a general module that inputs the first character feature map into the first downsampling model, and an enhancement module that inputs the first character feature map into the first downsampling model;

the general module performs reduced feature map processing on the first character feature map to obtain a general matrix, and the enhancement module performs reduced feature map processing and enhancement processing on the first character feature map to obtain an enhancement matrix;

fusing the general matrix and the enhancement matrix to obtain a fusion matrix;

performing channel extraction processing on the fusion matrix according to a preset first extraction parameter to obtain a first characteristic diagram;

and inputting the first feature map into the first ShuffleNet model, and performing character feature extraction on the first feature map by the first ShuffleNet model to obtain the first fusion feature map.

7. The document review method of claim 1, wherein before entering each of the string region images into a ShuffleNet model-based lightweight character recognition model, comprising:

acquiring a character string image sample set; the character string image sample set contains a plurality of character string image samples, the character string image samples are associated with a character string label, and the character string image samples comprise printing samples containing printing characters and handwriting samples containing handwriting characters;

inputting the character string image sample into a deep convolution neural network model containing initial parameters;

the deep convolutional neural network model extracts character features of the character string image sample, and a sample result output by the deep convolutional neural network model according to the extracted character features is obtained;

confirming a loss value according to the sample result of the character string image sample and the character string label of the character string image sample;

when the loss value reaches a preset convergence condition, recording the converged deep convolutional neural network model as a trained lightweight character recognition model;

and when the loss value does not reach a preset convergence condition, iteratively updating the initial parameters of the deep convolutional neural network model until the loss value reaches the preset convergence condition, and recording the converged deep convolutional neural network model as a trained lightweight character recognition model.

8. A document auditing device, comprising:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements a document auditing method according to any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out a document auditing method according to any one of claims 1 to 7.