CN111666932B

CN111666932B - Document auditing method, device, computer equipment and storage medium

Info

Publication number: CN111666932B
Application number: CN202010461277.XA
Authority: CN
Inventors: 唐子豪; 刘莉红; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2023-07-14
Anticipated expiration: 2040-05-27
Also published as: CN111666932A

Abstract

The invention discloses a bill auditing method, a bill auditing device, computer equipment and a storage medium, wherein the bill auditing method comprises the following steps: acquiring a to-be-checked bill image associated with a bill checking instruction; inputting the to-be-checked single data image into a character string region identification model, and obtaining at least three character string region images through a YOLO algorithm; inputting each character string region image into a trained lightweight character recognition model, and extracting character features of the character string region images to obtain recognition results; judging whether the bill information is the same as the bill number to obtain a first judging result, judging whether the identity card information is the same as the identity card number to obtain a second judging result, and simultaneously judging whether the bank card information is the same as the bank card number to obtain a third judging result; if the first discrimination result, the second discrimination result and the third discrimination result are the same, the pass of the audit is determined. The invention realizes the automatic identification of the character string in the bill and the verification of the character string, thereby achieving the effect of automatically verifying the bill.

Description

Document auditing method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a document auditing method, apparatus, computer device, and storage medium.

Background

Along with the development of science and technology, more and more scenes need to be checked on various document images and bill images, for example, in many scenes, the images of the filled bill of various personnel need to be checked and verified to determine whether information in the images is wrong or not, the workload is extremely high, handwritten characters often appear in the filled bill of various personnel, at present, manual identification is needed for identification of the handwritten characters, and a certain identification error rate exists; in the prior art, document images are also recognized by an OCR (Optical Character Recognition ) technology, and as the conventional OCR technology recognizes characters one by one, the response time of the conventional OCR technology to character (especially handwritten characters) recognition is long, the program capacity and the load are large, the operation of a server is emphasized, the bottleneck of recognition efficiency is caused, the server can overload operation, the auditing efficiency is finally influenced, and the customer satisfaction degree is poor.

Disclosure of Invention

The invention provides a bill auditing method, a device, computer equipment and a storage medium, which realize automatic identification of a character string in a bill and audit of the character string, achieve the effect of automatic bill auditing, greatly reduce the capacity of a model and simplify the structure of the model, can be applied to mobile equipment, improve the identification accuracy and reliability, and improve the satisfaction of customers.

A document auditing method, comprising:

after receiving a receipt auditing instruction, acquiring a to-be-audited receipt image associated with the receipt auditing instruction; the bill auditing instruction comprises a bill number, an identity card number of an object and a bank card number of the object;

inputting the to-be-checked single data image into a character string region identification model, and acquiring at least three character string region images in the to-be-checked single data image identified by the character string region identification model through a YOLO algorithm;

inputting each character string region image into a trained lightweight character recognition model, wherein the lightweight character recognition model extracts character features of the character string region images, and obtains recognition results output by the lightweight character recognition model according to the extracted character features; the recognition result comprises the character string category and character string information of the character string area image; the character string category comprises bill categories, identity card categories and bank card categories; the character string information comprises bill information corresponding to the bill class, identity card information corresponding to the identity card class and bank card information corresponding to the bank card class; the lightweight character recognition model is a neural network model based on a ShuffleNet model;

Judging whether the bill information is the same as the bill number or not to obtain a first judging result, judging whether the identity card information is the same as the identity card number or not to obtain a second judging result, and judging whether the bank card information is the same as the bank card number or not to obtain a third judging result;

and if the first discrimination result, the second discrimination result and the third discrimination result are the same, determining that the to-be-checked single data image passes checking.

A document auditing apparatus comprising:

the receiving module is used for acquiring a to-be-checked bill image associated with the bill checking instruction after receiving the bill checking instruction; the bill auditing instruction comprises a bill number, an identity card number of an object and a bank card number of the object;

the acquisition module is used for inputting the to-be-checked single data image into a character string region identification model, and acquiring at least three character string region images in the to-be-checked single data image identified by the character string region identification model through a YOLO algorithm;

the recognition module is used for inputting each character string region image into a trained light-weight character recognition model, and the light-weight character recognition model extracts character features of the character string region images to obtain recognition results output by the light-weight character recognition model according to the extracted character features; the recognition result comprises the character string category and character string information of the character string area image; the character string category comprises bill categories, identity card categories and bank card categories; the character string information comprises bill information corresponding to the bill class, identity card information corresponding to the identity card class and bank card information corresponding to the bank card class; the lightweight character recognition model is a neural network model based on a ShuffleNet model;

The judging module is used for judging whether the bill information is the same as the bill number or not to obtain a first judging result, judging whether the identity card information is the same as the identity card number or not to obtain a second judging result, and judging whether the bank card information is the same as the bank card number or not to obtain a third judging result;

and the determining module is used for determining that the to-be-checked single-data image passes the check if the first judging result, the second judging result and the third judging result are the same.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the document auditing method described above when the computer program is executed.

A computer readable storage medium storing a computer program which when executed by a processor performs the steps of the document auditing method described above.

According to the bill auditing method, device, computer equipment and storage medium, at least three character string area images (including the printed character string area image and the handwritten character string area image) in the to-be-audited single data image are identified through the YOLO algorithm, the to-be-audited single data image is input into the light character recognition model based on the ShuffleNet model to conduct character feature extraction, the identification result is obtained, whether the bill is audited and passed or not is determined according to the identification result, and therefore the method and the device realize automatic identification of the character strings in the bill and audit of the character strings, achieve the effect of automatic bill auditing, greatly reduce the capacity of the model and simplify the structure of the model, can be applied to mobile equipment, improve the identification accuracy and reliability, and improve the satisfaction of customers.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application environment of a document auditing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a document auditing method in accordance with an embodiment of the present invention;

FIG. 3 is a flow chart of a document auditing method in accordance with another embodiment of the present invention;

FIG. 4 is a flowchart of step S20 of a document auditing method in accordance with an embodiment of the present invention;

FIG. 5 is a flowchart of step S30 of a document auditing method in accordance with an embodiment of the present invention;

FIG. 6 is a flow chart of step S302 of a document auditing method in an embodiment of the present invention;

FIG. 7 is a flow chart of step S30201 of a document auditing method in an embodiment of the present invention;

FIG. 8 is a flow chart of step S30 of a document auditing method in another embodiment of the present invention;

FIG. 9 is a functional block diagram of a document auditing apparatus in an embodiment of the present invention;

FIG. 10 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The document auditing method provided by the invention can be applied to an application environment as shown in fig. 1, wherein a client (computer equipment) communicates with a server through a network. Among them, clients (computer devices) include, but are not limited to, personal computers, notebook computers, smartphones, tablet computers, cameras, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a document auditing method is provided, and the technical scheme mainly includes the following steps S10-S50:

s10, after receiving a receipt auditing instruction, acquiring a to-be-audited receipt image associated with the receipt auditing instruction; the bill auditing instruction comprises a bill number, an identity card number of an object and a bank card number of the object.

Understandably, after a staff fills out a document and shoots an image of the filled-out document, submitting a shot to-be-checked document image for checking, and triggering the document checking instruction, wherein the document checking instruction is associated with the to-be-checked document image, and the document checking instruction comprises a document number, an identity card number of an object and a bank card number of the object, and the to-be-checked document image comprises the document number, the identity card number of the object and filling contents (including printing contents and handwriting contents) of the bank card number of the object, for example: when a user applies for a loan, a worker needs to fill in a loan contract number, a borrower's identification card number and a borrower's bank card number; when a worker applies for a car insurance claim document for a customer, the claim document number, the identity card number of the customer, the bank card number of the customer and the like need to be filled in.

The bill number is a preset fixed length, the identification card number of the object is a character string composed of numbers and letters in 18 bits, and the bank card number of the object is a character string composed of numbers in 16 bits.

S20, inputting the to-be-checked single data image into a character string region identification model, and acquiring at least three character string region images in the to-be-checked single data image identified by the character string region identification model through a YOLO algorithm.

The YOLO (You Only Look Once) algorithm is an algorithm for performing grid recognition after grid segmentation on an input image, and monitoring an area of an object to be recognized, wherein the character string area recognition model is a trained YOLO-based neural network model, the character string area recognition model extracts character string features from the to-be-verified individual image, the character string features are image features of a character string composed of continuous numbers or letters, the character string area recognition model recognizes at least three character string area images in the to-be-verified individual image according to the character string features, the character string area images comprise a document number image containing document number content, an identity card number image containing identity card number content and a bank card number image containing bank card number content, the character string area images comprise printed character string area images and handwritten character string area images, and the printed character string area images are images containing area ranges of printed character strings and handwritten character strings.

In an embodiment, as shown in fig. 4, in the step S20, that is, the obtaining, by a YOLO algorithm, a plurality of character string region images in the to-be-verified document image identified by the character string region identification model includes:

s201, carrying out gray processing on the to-be-checked single data image by a preprocessing model in the character string region identification model to obtain a gray image.

Understandably, the character string region recognition model includes the preprocessing model, the preprocessing model performs gray processing on the to-be-checked single data image, the gray processing is performed on each pixel point of the to-be-checked single data image, and the gray image is an image of the to-be-checked single data image after the gray processing, so that the character string effect of the to-be-checked single data image can be more obvious and the recognition is more reliable through the gray processing.

S202, inputting the gray image into a YOLO recognition model in the character string region recognition model.

It is to be understood that the character string region recognition model further includes the YOLO recognition model, the YOLO recognition model is a neural network model based on a YOLO algorithm, and the network structure of the YOLO recognition model may be set according to requirements, for example, the network structure of the YOLO recognition model is a YOLO v1, YOLO v2, YOLO v3 network structure, and the like, and preferably, the network structure of the YOLO recognition model is a YOLO v3 network structure, so that the region recognized by the YOLO recognition model is more accurate and efficient.

S203, extracting character string features in the gray level image by the YOLO identification model through a YOLO algorithm, and obtaining an identification area containing the character string in the gray level image.

Understandably, the character string features are image features of character strings composed of consecutive numbers or letters, character strings in the grayscale image are identified by YOLO algorithm, and the identified area in the grayscale image is marked as an identification area.

S204, intercepting the identification area as the character string area image.

Understandably, the identification area identified in the gray-scale image is truncated, and the truncated identification area is determined as the character string area image.

The method comprises the steps of carrying out gray processing on the to-be-checked single data image through a preprocessing model to obtain a gray image, and inputting a YOLO recognition model in the character string region recognition model; extracting character string features in the gray level image through a YOLO algorithm to obtain an identification area containing character strings; the character string region image is cut out, so that the character string region image in the to-be-checked single data image can be quickly and accurately identified through gray processing and a YOLO algorithm, and the identification accuracy and reliability are improved.

S30, inputting each character string region image into a trained lightweight character recognition model, wherein the lightweight character recognition model extracts character features of the character string region images, and obtains recognition results output by the lightweight character recognition model according to the extracted character features; the recognition result comprises the character string category and character string information of the character string area image; the character string category comprises bill categories, identity card categories and bank card categories; the character string information comprises bill information corresponding to the bill class, identity card information corresponding to the identity card class and bank card information corresponding to the bank card class; the lightweight character recognition model is a neural network model based on a ShuffleNet model.

As the printed character string area image corresponding to one character string is close to the corresponding handwritten character string area image in size, the tolerance range can be set according to the requirement, the light weight character recognition model is obtained by training the sample containing the character strings (including an identity card number, a bank card number, a bill number and the like) in the tolerance range, the light weight character recognition model extracts the character characteristics of each character string area image (including the bill number image, the identity card number image and the bank card number image), predicts and outputs the recognition result according to the character characteristics, therefore, the character recognition model can rapidly recognize the printed character strings and the handwritten character strings without carrying out character recognition operation one by one after the character recognition operation, and greatly simplifies the printing and handwriting recognition processing and greatly reduces the network capacity of the network model; the recognition result comprises the character string category and character string information of the character string area image; the character string category comprises bill categories, identity card categories and bank card categories; the bill is in a category containing bill numbers, the identity card is in a category containing identification card numbers, the bank card is in a category containing bank card numbers, the character string information comprises bill information corresponding to the bill, identity card information corresponding to the identity card and bank card information corresponding to the bank card, the bill information is information of identified bill numbers, the identity card information is information of identified identity card numbers, and the bank card information is information of identified bank card numbers.

The light character recognition model is a neural network model based on a ShuffleNet model, the ShuffleNet model is a model containing a basic ResNet light-weight structure, and the ShuffleNet model is a light-weight neural network model, so that the calculation amount of the model can be greatly reduced while the precision is kept, the capacity of the neural network is greatly reduced, the method can be applied to mobile equipment to reduce the capacity and the operation load of a server (namely, the operation is performed on the mobile equipment, and the result after the operation is provided for the server for processing operation); the network structure of the ShuffleNet model may be set according to the requirement, for example, the network structure of the ShuffleNet model may be a ShuffleNet V1 network structure or a ShuffleNet V2 network structure, the network structure of the ShuffleNet model may also be a network structure that includes a Channel Split (Channel Split) first, two branches, one of the branch paths is set to input a 1×1 convolution, a 3×3 deep convolution (DWConv, depthwise convolution) is connected (mainly to reduce the calculation amount), another 1×1 convolution is connected, a short circuit connection is performed finally, the output result of the other 1×1 convolution is coupled with another branch, and a final result is output through a Channel shuffle (Channel shuffle, which is a random shuffle Channel order).

In an embodiment, as shown in fig. 5, in step S30, that is, the lightweight character recognition model performs character feature extraction on the character string region image, the obtaining a recognition result output by the lightweight character recognition model according to the extracted character feature includes:

s301, inputting the character string region image into a first convolution layer in the lightweight character recognition model, and compressing and dimension-increasing the character string region image by the first convolution layer to obtain a first character feature map.

It may be appreciated that the lightweight character recognition model includes a first convolution layer, a second convolution layer, a pooling layer, a full connection layer and an output layer, where the first convolution layer may be set according to requirements, for example, the first convolution layer may be set as a single convolution layer (3×3×24 convolution kernel, a step size is 2), or may be set as a convolution layer (3×3 convolution kernel) followed by a maximum pooling layer (3×3 pooling), and so on, where by integrating characteristics of characters, preferably, the first convolution layer is a single convolution layer, the first convolution layer performs compression and dimension-increasing processing on the character string region image, that is, the first convolution layer performs processing on the 3×3×24 convolution kernel, so that the size of the character string region image may be reduced and the dimension (typically, 3 dimensions) of the character string region image may be increased to a number of 24 dimensions, so as to generate the first character feature map, where the first feature map is a multi-dimensional matrix of numerical values.

S302, inputting the first character feature map into a second convolution layer in the lightweight character recognition model, and carrying out character feature extraction on the first character feature map by the second convolution layer to obtain a second character feature map; the second convolution layer is a convolution layer based on a downsampling model and a ShuffleNet model.

As can be appreciated, the second convolution layer includes a downsampling model with a plurality of different channel parameters and a ShuffleNet model with a plurality of different convolution kernel parameters, where the downsampling model is used to reduce the feature map size and add channels (also the dimensions in the whole text), and extract wider features, that is, improve the fault tolerance of the whole model, and the maximum function of the ShuffleNet model is feature extraction, so that the features can be grabbed more accurately; the character features are the character and number printing and handwriting image features, the character features further comprise image length features, and the second character feature map is a matrix array obtained by extracting the character features of the first character feature map.

In an embodiment, as shown in fig. 6, in step S302, that is, the inputting the first character feature map into the second convolution layer in the lightweight character recognition model, the second convolution layer performs character feature extraction on the first character feature map to obtain a second character feature map, which includes:

S30201, inputting the first character feature map into a first fusion model in the second convolution layer, and performing downsampling and feature extraction on the first character feature map by the first fusion model to obtain a first fusion feature map; the first fusion model includes a first downsampled model and a first ShuffleNet model.

The second convolution layer includes a first fusion model, a second fusion model and a third fusion model, where the first fusion model, the second fusion model and the third fusion model are all models constructed based on a downsampling model and a ShuffleNet model, the first fusion model includes the first downsampling model and the first ShuffleNet model, the first downsampling model is a downsampling model including a first sampling parameter, the downsampling model is a preset convolutional neural network model, the downsampling model includes a general module and an enhancement module, the general module and the enhancement module respectively process an input feature map to increase a channel number of the feature map and highlight the character feature, the network structure and the parameters of the first downsampling model are determined to be first channel parameters, the first channel parameters include all parameters of the general module and the enhancement module, the general module can include a convolution kernel 3×1 and a downsampling factor, and the downsampling factor is also referred to as a downsampling factor of the first channel by 1×1, and the feature map is prevented from being sampled by the general feature map, and the downsampling factor is prevented from being sampled by the general feature map 1×1.

The first ShuffleNet model is a neural network model based on the ShuffleNet model, the first ShuffleNet model receives an output result of the first downsampling model and performs feature extraction processing on the output result, the feature extraction is to extract the character feature, a network structure of the first ShuffleNet model may be set according to requirements, for example, the network structure of the first ShuffleNet model may be a network structure of ShuffleNet V1 or ShuffleNet V2, and preferably, the network structure of the first ShuffleNet model is a network structure spliced by network structures of three ShuffleNet V2.

In an embodiment, as shown in fig. 7, in step S30201, that is, the inputting the first character feature map into the first fusion model in the second convolution layer, the first fusion model performs a downsampling process and a feature extraction process on the first character feature map to obtain a first fusion feature map, which includes:

s302011, inputting the first character feature map to a generic module in the first downsampled model and inputting the first character feature map to an enhancement module in the first downsampled model.

It is understood that the general module includes a convolution kernel of 3×3, a convolution kernel of step size 2, and a convolution kernel of 1×1, and a convolution kernel of step size 1, and the enhancement module includes a convolution kernel of 1×1, a convolution kernel of step size 1, a convolution kernel of 3×3, a convolution kernel of step size 2, and a convolution kernel of 1×1, and a convolution kernel of step size 1, that is, the enhancement module adds a convolution kernel of 1×1 and a convolution kernel of step size 1 before the general module, and the general module and the enhancement module respectively process the first character feature map.

S302012, the general module performs the feature map shrinking process on the first character feature map to obtain a general matrix, and the enhancement module performs the feature map shrinking process and the enhancement process on the first character feature map to obtain an enhancement matrix.

The processing of the reduced feature map is to perform convolution processing on the first character feature map, reduce the size of the first character feature map, the general matrix is a matrix array obtained after the first character feature map is subjected to the processing of the reduced feature map by the general module, the enhancement processing is to perform convolution processing on the first character feature map through a convolution kernel of 1×1 and a convolution with a step length of 1, highlight the character feature, and the enhancement matrix is a matrix array obtained after the first character feature map is subjected to the processing of the reduced feature map and the enhancement processing by the enhancement module.

S302013, fusing the universal matrix and the enhancement matrix to obtain a fusion matrix.

Understandably, the fusion is a merge between channels, such as: and fusing the feature map of one 2 channel with the feature map of one 2 channel to obtain a feature map of one 4 channel, and carrying out channel combination on the universal matrix and the enhancement matrix to obtain the fusion matrix, wherein the number of channels of the fusion matrix is the sum of the number of channels of the channel matrix and the number of channels of the enhancement matrix.

S302014, performing channel extraction processing on the fusion matrix according to a preset first extraction parameter to obtain a first feature map.

Understandably, the first extraction parameter is the number of random extraction in all channel numbers, the fusion matrix is extracted according to the first extraction parameter to obtain the first feature map, the channel extraction processing refers to the operation of randomly extracting all channel numbers, and the first feature map is a matrix array after channel extraction is performed on the fusion matrix.

S302015, inputting the first feature map into the first ShuffleNet model, and extracting character features of the first feature map by the first ShuffleNet model to obtain the first fusion feature map.

Understandably, the first ShuffleNet model is a neural network model formed by ShuffleNet models of three different convolution kernels, the network structure of the first ShuffleNet model is a network structure formed by splicing three network structures of ShuffleNet V2, and the first ShuffleNet model performs the character feature extraction on the first feature map to obtain the first fusion feature map.

The method comprises the steps that the first character feature map is input into a first fusion model in the second convolution layer, and the first fusion model performs downsampling processing and feature extraction processing on the first character feature map to obtain a first fusion feature map; inputting the first fusion feature map into a second fusion model in the second convolution layer, and performing downsampling and feature extraction on the first fusion feature map by the second fusion model to obtain a second fusion feature map; the second fusion feature map is input into a third fusion model in the second convolution layer, the third fusion model performs downsampling processing and feature extraction processing on the second fusion feature map to obtain a second character feature map, and therefore downsampling processing and character feature extraction on the first character feature map are achieved through fusion of the downsampling model and the ShuffeNet model to obtain the second character feature map, character features are highlighted, recognition accuracy and reliability are improved, and recognition efficiency is improved.

S30202, inputting the first fusion feature map into a second fusion model in the second convolution layer, and performing downsampling and feature extraction on the first fusion feature map by the second fusion model to obtain a second fusion feature map; the second fusion model includes a second downsampling model and a second ShuffleNet model.

The second fusion model may include the second downsampling model and the second ShuffleNet model, where the second downsampling model is a downsampling model including a second sampling parameter, the second sampling parameter may be the same as the first sampling parameter or different from the first sampling parameter, the second downsampling model performs downsampling processing on the first fusion feature map, the second ShuffleNet model is a neural network model based on the ShuffleNet model, the second ShuffleNet model receives an output result of the second downsampling model and performs feature extraction processing on the output result, and a network structure of the second ShuffleNet model may be set according to requirements, for example, a network structure of the second ShuffleNet model may be a network structure of a ShuffleNet V1 or a network structure of a ShuffleNet V2, and preferably, the network structure of the second ShuffleNet model is a network structure of a ShuffleNet V2 spliced by the ShuffleNet V2.

S30203, inputting the second fusion feature map into a third fusion model in the second convolution layer, and performing downsampling and feature extraction on the second fusion feature map by the third fusion model to obtain a second character feature map; the third fusion model includes a third downsampling model and a third ShuffleNet model.

The third fusion model may include the third downsampling model and the third ShuffleNet model, where the third downsampling model is a downsampling model including a third sampling parameter, the third sampling parameter may be the same as the first sampling parameter or the second sampling parameter, or may be different from the first sampling parameter or the second sampling parameter, the third downsampling model performs downsampling processing on the second fusion feature map, the third ShuffleNet model is a neural network model based on the ShuffleNet model, the third ShuffleNet model receives an output result of the third downsampling model and performs feature extraction processing on the output result, and a network structure of the third ShuffleNet model may be set according to a requirement, for example, a network structure of the third ShuffleNet model may be a ShuffleNet V1 or a ShuffleNet V2, and as a preferred network structure of the third ShuffleNet V2, the third ShuffleNet model performs a stitching network structure.

According to the invention, the feature map size is reduced through downsampling models of different channel parameters, channels (namely dimensions) are added at the same time, wider features are extracted, namely the fault tolerance of the whole model is improved, and features are extracted through a SheffleNet model of different convolution kernel parameters, so that features can be more accurately captured, and the recognition accuracy and reliability are improved.

S303, inputting the second character feature map into a pooling layer in the lightweight character recognition model, and pooling the second character feature map by the pooling layer to obtain a third character feature map.

It can be appreciated that the pooling method can be set according to requirements, for example, the pooling process can be average pooling, maximum pooling, and the like, and the pooling process is used for performing dimension reduction on the second character feature map, and the third character feature map is a one-dimensional matrix array.

S304, inputting the third character feature map into a full connection layer in the lightweight character recognition model, and performing feature connection on the third character feature map by the full connection layer to obtain a connection matrix.

The feature connection is to map the obtained feature vector values to the positions of the sample marking space, and perform weighted summarization, connect the feature vectors, and perform feature connection on the third character feature map through the full connection layer to obtain the connection matrix, where the connection matrix is a one-dimensional matrix array after sequencing. For example: the vector groups of 160 are connected into one dimension after convolution by 160 1 x 1 3-dimensional convolution kernels.

S305, inputting the connection matrix into an output layer in the lightweight character recognition model, and performing prediction classification processing on the connection matrix by the output layer to obtain a recognition result.

Understandably, the output layer performs a prediction classification process on the connection matrix, where the prediction classification process is to classify and predict the numerical values in the connection matrix, that is, a softmax process, and finally obtain the recognition result.

The character string region image is input into a first convolution layer, compression and dimension increasing processing are carried out on the character string region image, a first character feature image is obtained, a second convolution layer is input, character feature extraction is carried out on the first character feature image, and a second character feature image is obtained; the second convolution layer is a convolution layer based on a downsampling model and a ShuffleNet model; inputting the second character feature map into a pooling layer, and pooling the second character feature map to obtain a third character feature map; inputting the third character feature map into a full connection layer, and performing feature connection on the third character feature map to obtain a connection matrix; the connection matrix is input and output, the output layer carries out prediction classification processing on the connection matrix to obtain a recognition result, character strings in a character string area image (comprising a printed character string area image and a handwritten character string area image) are rapidly and accurately recognized according to the character characteristics, the operation of character-by-character recognition after character-by-character splitting is not needed, the distinguishing recognition processing of printing and handwriting is also not needed, the network structure of a neural network is greatly simplified, the capacity of a model is greatly reduced, and the method is easy to apply to mobile equipment.

In one embodiment, as shown in fig. 8, before the step S30, that is, before the step of inputting each of the character string region images into the lightweight character recognition model based on the ShuffleNet model, the method includes:

s306, acquiring a character string image sample set; the character string image sample set contains a plurality of character string image samples, the character string image samples are associated with one character string label, and the character string image samples comprise printing samples containing printing characters and handwriting samples containing handwriting characters.

The character string image sample set includes at least one character string image sample, each character string image sample is associated with one character string label, the character string image sample includes the print sample and the handwriting sample, the character string label is a character string included in the corresponding character string image sample, the print sample is an image with printed characters, and the handwriting sample is an image with manually handwritten characters.

S307, inputting the character string image sample into a deep convolutional neural network model containing initial parameters.

It is understood that the initial parameters may be set according to requirements, for example, the initial parameters may be random parameters or parameters learned by migration in a neural network model of the same network structure.

And S308, extracting character features of the character string image samples by the deep convolutional neural network model, and obtaining sample results output by the deep convolutional neural network model according to the extracted character features.

Understandably, the sample result is a result of the deep convolutional neural network model, which is identified and output according to the character features extracted from the character string image sample.

S309, confirming a loss value according to the sample result of the character string image sample and the character string label of the character string image sample.

Understandably, the corresponding loss value is determined by comparing the sample result of the character string image sample with the character string label, namely, the loss value is calculated by the loss function of the deep convolutional neural network model.

S3010, recording the depth convolution neural network model after convergence as a training light character recognition model when the loss value reaches a preset convergence condition.

It is understood that the convergence condition may be a condition that the loss value is smaller than a set threshold, that is, when the loss value is smaller than the set threshold, the deep convolutional neural network model after convergence is recorded as a training-completed lightweight character recognition model.

S3011, when the loss value does not reach a preset convergence condition, iteratively updating initial parameters of the deep convolutional neural network model until the loss value reaches the preset convergence condition, and recording the converged deep convolutional neural network model as a training light character recognition model.

In this way, the convergence condition may be a condition that the value of the loss value after 900 times of calculation is very small and will not fall any more, that is, when the value of the loss value after 900 times of calculation is very small and will not fall any more, training is stopped, and the deep convolutional neural network model after convergence is recorded as a training light weight character recognition model.

Therefore, when the loss value does not reach the preset convergence condition, the initial parameters of the depth convolution neural network model are updated and iterated continuously, the accurate recognition result can be closed continuously, and the accuracy of the recognition result is higher and higher.

S40, judging whether the bill information is the same as the bill number to obtain a first judging result, judging whether the identity card information is the same as the identity card number to obtain a second judging result, and judging whether the bank card information is the same as the bank card number to obtain a third judging result.

Understandably, comparing the bill information with the bill number, and determining the first discrimination result according to the comparison result of the bill information and the bill number, wherein the first discrimination result comprises the same and different, namely when the bill information is the same as the bill number, the first discrimination result is the same; comparing the identity card information with the identity card number, and determining a second judging result according to the comparison result of the identity card information and the identity card number, wherein the second judging result comprises the same or different identity card information and the identity card number, namely the second judging result is the same when the identity card information is the same as the identity card number; comparing the bank card information with the bank card number, and determining a third judging result according to the comparison result of the bank card information and the bank card number, wherein the third judging result comprises the same and different, namely when the bank card information is the same as the bank card number, the third judging result is the same.

S50, if the first judging result, the second judging result and the third judging result are the same, determining that the to-be-checked single data image passes checking.

Understandably, when the first discrimination result, the second discrimination result and the third discrimination result are the same, determining that the to-be-checked document image is checked and passed, marking the to-be-checked document image as checked and passed, and sending the marked to-be-checked document image to a server.

The invention realizes that at least three character string area images (including a printed character string area image and a handwritten character string area image) in a to-be-checked single data image are identified through a YOLO algorithm, the to-be-checked single data image is input into a light character identification model based on a SheffeNet model to carry out character feature extraction, an identification result is obtained, whether checking is passed or not is determined according to the identification result, the character strings in a bill can be automatically identified and checked, the effect of automatically checking the bill is achieved, the capacity of the model is greatly reduced, the structure of the model is simplified, the method can be applied to mobile equipment, the identification accuracy and reliability are improved, and the satisfaction degree of a customer is improved.

In an embodiment, as shown in fig. 3, after the step S40, that is, after the step of determining whether the bank card information is the same as the bank card number, a third determination result is obtained, the method further includes:

S60, if at least one of the first discrimination result, the second discrimination result and the third discrimination result is different, determining that the to-be-checked single data image is not checked.

Understandably, if one of the first, second and third discrimination results contains a difference, the to-be-checked single data image is determined to be failed in checking, which indicates that the to-be-checked single data image does not meet the requirement.

According to the method and the device, at least one of the first discrimination result, the second discrimination result and the third discrimination result is different, and the to-be-checked single-data image is determined to be failed in checking, so that the identification efficiency is improved. In an embodiment, a document auditing device is provided, where the document auditing device corresponds to the document auditing method in the above embodiment one by one. As shown in fig. 9, the document auditing apparatus includes a receiving module 11, an acquiring module 12, an identifying module 13, a judging module 14, and a confirming module 15.

The functional modules are described in detail as follows:

the receiving module 11 is used for acquiring a to-be-checked bill image associated with the bill checking instruction after receiving the bill checking instruction; the bill auditing instruction comprises a bill number, an identity card number of an object and a bank card number of the object;

The obtaining module 12 is configured to input the to-be-checked single data image into a character string region identification model, and obtain at least three character string region images in the to-be-checked single data image identified by the character string region identification model through a YOLO algorithm;

the recognition module 13 is configured to input each of the character string region images into a trained lightweight character recognition model, where the lightweight character recognition model performs character feature extraction on the character string region image, and obtains a recognition result output by the lightweight character recognition model according to the extracted character features; the recognition result comprises the character string category and character string information of the character string area image; the character string category comprises bill categories, identity card categories and bank card categories; the character string information comprises bill information corresponding to the bill class, identity card information corresponding to the identity card class and bank card information corresponding to the bank card class; the lightweight character recognition model is a neural network model based on a ShuffleNet model;

the judging module 14 is configured to judge whether the bill information is the same as the bill number, obtain a first judging result, judge whether the identification card information is the same as the identification card number, obtain a second judging result, and judge whether the bank card information is the same as the bank card number, and obtain a third judging result;

And the determining module 15 is configured to determine that the to-be-checked document image passes the check if the first, second and third discrimination results are the same.

For specific limitations of the document auditing apparatus, reference may be made to the above limitations of the document auditing method, and will not be described in detail herein. The modules in the document auditing device can be realized in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a document auditing method.

In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the document auditing method of the above embodiments when the computer program is executed by the processor.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor implements the document auditing method of the above embodiments.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A document auditing method, comprising:

2. The document auditing method of claim 1, wherein the determining whether the bank card information is the same as the bank card number, after obtaining a third determination result, further comprises:

and if at least one of the first discrimination result, the second discrimination result and the third discrimination result is different, determining that the to-be-checked single data image is not checked.

3. The document auditing method of claim 1, wherein the obtaining, by a YOLO algorithm, a plurality of character string region images in the document image to be audited identified by the character string region identification model includes:

the preprocessing model in the character string region recognition model carries out gray processing on the to-be-checked single data image to obtain a gray image;

inputting the gray image into a YOLO recognition model in the character string region recognition model;

extracting character string features in the gray level image by the YOLO identification model through a YOLO algorithm to obtain an identification area containing the character string in the gray level image;

And intercepting the identification area as the character string area image.

4. The document auditing method of claim 1, wherein the lightweight character recognition model performs character feature extraction on the character string region image, and obtaining a recognition result output by the lightweight character recognition model according to the extracted character features comprises:

inputting the character string region image into a first convolution layer in the lightweight character recognition model, and compressing and dimension-increasing the character string region image by the first convolution layer to obtain a first character feature map;

inputting the first character feature map into a second convolution layer in the lightweight character recognition model, and extracting character features of the first character feature map by the second convolution layer to obtain a second character feature map; the second convolution layer is a convolution layer based on a downsampling model and a ShuffleNet model;

inputting the second character feature map into a pooling layer in the lightweight character recognition model, and pooling the second character feature map by the pooling layer to obtain a third character feature map;

inputting the third character feature map into a full connection layer in the lightweight character recognition model, and performing feature connection on the third character feature map by the full connection layer to obtain a connection matrix;

And inputting the connection matrix into an output layer in the lightweight character recognition model, and performing prediction classification processing on the connection matrix by the output layer to obtain a recognition result.

5. The document auditing method of claim 4, wherein the second convolution layer performs character feature extraction on the first character feature map to obtain a second character feature map, including:

inputting the first character feature map into a first fusion model in the second convolution layer, and performing downsampling and feature extraction on the first character feature map by the first fusion model to obtain a first fusion feature map; the first fusion model comprises a first downsampling model and a first ShuffleNet model;

inputting the first fusion feature map into a second fusion model in the second convolution layer, and performing downsampling and feature extraction on the first fusion feature map by the second fusion model to obtain a second fusion feature map; the second fusion model comprises a second downsampling model and a second ShuffleNet model;

inputting the second fusion feature map into a third fusion model in the second convolution layer, and performing downsampling and feature extraction on the second fusion feature map by the third fusion model to obtain a second character feature map; the third fusion model includes a third downsampling model and a third ShuffleNet model.

6. The document auditing method of claim 5, wherein the first fusion model performs downsampling and feature extraction on the first character feature map to obtain a first fusion feature map, comprising:

a generic module that inputs the first character feature map into the first downsampling model, and an enhancement module that inputs the first character feature map into the first downsampling model;

the general module performs feature map shrinking processing on the first character feature map to obtain a general matrix, and the enhancement module performs feature map shrinking processing and enhancement processing on the first character feature map to obtain an enhancement matrix;

fusing the universal matrix and the enhancement matrix to obtain a fusion matrix;

carrying out channel extraction processing on the fusion matrix according to a preset first extraction parameter to obtain a first feature map;

and inputting the first feature map into the first ShuffleNet model, and extracting character features of the first feature map by the first ShuffleNet model to obtain the first fusion feature map.

7. The document auditing method of claim 1, wherein before said inputting each of said character string region images into a lightweight character recognition model based on a ShuffleNet model, comprising:

Acquiring a character string image sample set; the character string image sample set contains a plurality of character string image samples, the character string image samples are associated with one character string label, and the character string image samples comprise printing samples containing printing characters and handwriting samples containing handwriting characters;

inputting the character string image sample into a deep convolutional neural network model containing initial parameters;

the deep convolutional neural network model performs character feature extraction on the character string image sample to obtain a sample result output by the deep convolutional neural network model according to the extracted character feature;

confirming a loss value according to the sample result of the character string image sample and the character string label of the character string image sample;

recording the depth convolution neural network model after convergence as a training light character recognition model when the loss value reaches a preset convergence condition;

and when the loss value does not reach a preset convergence condition, iteratively updating initial parameters of the deep convolutional neural network model until the loss value reaches the preset convergence condition, and recording the deep convolutional neural network model after convergence as a training light character recognition model.

8. A document auditing apparatus, comprising:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the document auditing method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a document auditing method according to any one of claims 1 to 7.