CN113362032A

CN113362032A - Verification and approval method based on artificial intelligence image recognition

Info

Publication number: CN113362032A
Application number: CN202110638429.3A
Authority: CN
Inventors: 王进一; 周玄红; 李茜茹
Original assignee: Guizhou Kaifa Future Computer Technology Co ltd
Current assignee: Guizhou Kaifa Future Computer Technology Co ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-09-07

Abstract

The invention relates to a checking and approving method based on artificial intelligence image recognition, which comprises the following steps: 1) the user transacts business through a manual window or a network window and submits the result of photographing: uploading the essential picture and the essential field to an approval system; the approval system classifies the essential picture and judges whether the category of the essential picture meets the current business process; 2) after the picture classification of the essential element meets the acceptance requirement, after the acceptance authority user clicks and accepts, intelligently checking the essential element field, judging whether the essential element information is correct or not and whether the essential element information meets the auditing standard or not, prompting whether the reference opinion is approved or not, and judging whether the case passes or not by an auditor; if the case passes the verification, the case handling is completed. In conclusion, different templates of the same requirement are provided with different requirement checking points through different templates; the model is configured to complete perfect butt joint of the model and the service; the purpose of intelligent verification is achieved.

Description

Verification and approval method based on artificial intelligence image recognition

Technical Field

The invention belongs to the technical field of verification and approval methods, and particularly relates to the technical field of verification and approval methods based on artificial intelligence image recognition.

Background

The existing transaction flow is that when a client applies for transaction, information (pictures) of a plurality of specific requirements needs to be uploaded in a matched manner, for example: an identity card. The business handling personnel take pictures of specific essential components, upload the essential components, submit the essential components, and then carry out handling and checking work by other personnel, manually group the essential components, and carry out comparison, verification and checking on the filled information and the information in the business system. Due to the fact that the number of human factors such as configuration factors of business structure personnel, personnel operation habits, auditing standards and the like is too large. The specific service handling efficiency cannot be fixed, and the handling accuracy cannot be guaranteed; the uploaded key element (picture) is not directly related to the information to be compared and the service information, and the key element and the service information are repeatedly checked, compared and checked; the sorting of the requirements also needs to be done manually. The method not only increases the workload of review, but also increases the possibility of errors in service handling, and simultaneously prolongs the service handling time.

Disclosure of Invention

The invention provides a checking and approving method based on artificial intelligence image recognition, aiming at solving the defects of the problems.

The invention is realized by adopting the following technical scheme.

The invention discloses a verification and approval method based on artificial intelligence image recognition, which comprises the following steps:

1) the user transacts business through a manual window or a network window and submits the result of photographing: uploading the essential picture and the essential field to an approval system;

the approval system classifies the essential picture and judges whether the category of the essential picture meets the current business process;

if the condition that the essential picture is absent exists, the verification result of the approval system is failed;

2) after the picture classification of the essential element meets the acceptance requirement, after the acceptance authority user clicks and accepts, intelligently checking the essential element field, judging whether the essential element information is correct or not and whether the essential element information meets the auditing standard or not, prompting whether the reference opinion is approved or not, and judging whether the case passes or not by an auditor; if the case passes the verification, the case handling is completed.

The classification and intelligent verification method is a paper image classification method based on deep learning image target detection, and specifically comprises the following steps: 1) acquisition and labeling of training image data: the image acquired by the terminal image acquisition equipment is uploaded to an image processing server, the preliminary operation is to manually screen and classify and label the image, and the label file names correspond to the image file names one by one. The label format of the label is txt text file. The content of the label is generated by an open source tool LabelImg label.

The classification and intelligent verification method comprises the following steps: 2) performing K-means clustering on the initially screened original image data: replacing the initial anchor frame of YOLO-v3 with the initial anchor frame of the clustered anchor;

the modified K-means clustering algorithm applicable to the anchor frame comprises the following steps:

i. data represent: presetting the width and height sizes of 9 clusters:

S₀＝{cw₁,ch₁,cw₂,ch₂,......cw_n,ch_n},n＝9,

wherein cw_i,ch_iRespectively, the initial width and height, S₀Is the initial width-height set, the subscript represents the number of current clusters, and n represents the total number. The following is a broad-high representation of all data sets, D being the set of data, w_i,h_iIs the width and height in the data, n represents the total:

D＝{w₁,h₁,w₂,h₂,......w_n,h_n},n＝Size of Dataset.

distance calculation: and calculating the Euclidean distance from each data in the data set to any cluster by taking the cluster width and height S as a center:

d＝(d_wj,d_hj),

wherein, i, j respectively represent the serial number of the cluster and the serial numbers of all data:

i＝{1,2,3......,9},j＝{1,2,3......,Size of Dataset}.

update clusters: after the distance from each data to each cluster is calculated, all data are sequentially classified into the closest cluster, and then the new cluster width and height are calculated according to each cluster, wherein i represents a cluster class, and j represents data in different clusters:

i＝{1,2,3......,9},n＝Size of Each Dataset.

recombining into a new set of clusters (the letter meaning in the formula is consistent with the previous meaning)

S₁＝{cw₁,ch₁,cw₂,ch₂,......cw_n,ch_n},n＝9.

Repeating steps ii and iii until cluster S_nApproaching stabilization, this stabilized cluster was applied to the Yolo-v3 anchor frame.

The classification and intelligent verification method comprises the following steps: 3) improvement of the Yolo-v3 network model: the network model structure adopted by the image classification is a Yolo-v3 target detection model;

the core backbone network-residual structure of the target detection model structure is updated, and an HS-Block module is used for replacing 3X3 convolution operation, and the method comprises the following steps:

HS-Block Module:

Input:x(Tensor)

v₁,v₂,v₃,v₄,v₅＝split(x),

v₂₁,v₂₂＝split(conv3×3(v2)),

v₃₁，v₃₂＝split(conv3×3(concat(conv3×3(v₃)，v₂₂)))，

v₄₁，v₄₂＝split(conv3×3(concat(conv3×3(v4)，v₃₂)))，

v₅₁＝conv3×3(concat(conv3×3(v₅)，v₄₂)).

Output:y(Tensor)

y＝concat(v₁,v₂₁,v₃₁,v₄₁,v₅₁).

in the above formula, conv1 × 1 and conv3 × 3 represent convolution, split represents a feature map divided by layer average, and v represents_iRepresenting the profile, concat represents the per-layer connection profile.

In the hyper-parameter setting, the picture input (608x608) with the maximum Yolo-v3 at present is taken as the input of the network model;

in the aspect of preprocessing the pictures, data augmentation is carried out on all the pictures, and the specific operation is as follows: and rotating the left and right sides by 90 degrees respectively, randomly adjusting the exposure rate, and splicing, zooming and finely adjusting the angle of the picture.

The classification and intelligent verification method comprises the following steps: 3) performing image enhancement processing in a training process:

before the beginning of the Yolo-v3 model, a pre-trained repetitive channel and space attention model is firstly connected to the vector matrix of the input picture, and after the attention model is used for amplifying the more obvious region of the picture in a centralized way, the more obvious region of the picture is placed into a Yolo-v3 network, so that the detection effect of the network model on the more obvious region of the picture can be increased:

channel F_a(x) Space F_b(x) The attention module is as follows:

Input:x(Tensor)

maxpool,avgpool＝split(x)

F_a(x)＝conv1×1(relu(bn(conv1×1(concat(maxpool，avgpool)))))，

F_b1(x)＝conv1×1(relu(bn(conv1×1(x))))，

F_b2(x)＝conv1×1(relu(bn(conv3×1(conv1×3(x)))))，

F_b(x)＝F_b1(x)*F_b2(x).

Output:y(Tensor)

y＝(Softmax(F_a(x))*x)*Softmax(F_b(x))+x.

in the above formula, conv1 × 1, conv1 × 3, conv1 × 3 represent convolution operations of different sizes, and F_a、F_bRepresenting the profile output, Softmax is the mapping function.

The classification and intelligent verification method comprises the following steps: 4) improvements in sustainable training: the preliminarily trained model is directly applied to the image acquisition and screening work, and in the process, the filtering threshold value is reduced, and the acquisition efficiency of the image data set is increased;

roughly screening the original picture by using the model parameters of the first training, continuously optimizing and updating the model parameters, screening the original picture by using the new model parameters, and continuously updating and finely adjusting the hyper-parameters of the Yolo-v3 model by manpower in the process; and continuously circulating the process until the accuracy of the image classifier meets the service requirement.

The invention has the beneficial effects that different requirements and key information needing to be checked, such as the name and the certificate number of the identity card, are configured through the template. Different business types are matched with different model information, and different essential information which needs to be checked in business is intelligently checked. Through the screening of the keyword information, the essential elements are effectively classified, and the essential elements of the same type are distinguished and identified. And correcting and cutting the pictures which are photographed irregularly through an image verification algorithm. And (4) carrying out image recognition, and checking and comparing the key field information on the key element with the information in the system to judge whether the key field information passes the system. Unidentified pictures are uniformly placed. Finally, a suggestion whether the audit is passed is given. The efficiency of the auditors is improved, and the service handling time is reduced. In conclusion, different templates of the same requirement are provided with different requirement checking points through different templates; and the perfect butt joint of the model and the service is completed by configuring the model. The purpose of intelligent verification is achieved.

The invention is further explained below with reference to the drawings and the detailed description.

Drawings

FIG. 1 is a general flow diagram of the present invention.

FIG. 2 is a flowchart of the image classifier operation of the present invention.

FIG. 3 is a comparison of the original residual Block of the present invention with an alternative 3 × 3 HS-Block Block.

FIG. 4 is a flow chart of the channel attention operation of the present invention.

FIG. 5 is a flow chart of the spatial attention work of the present invention.

FIG. 6 is a flow chart of the operation of the present invention for the combination of channel attention and spatial attention.

Detailed Description

A verification and approval method based on artificial intelligence image recognition is disclosed, the general flow is shown in FIG. 1, and the following general description is made on the system:

1) and when the user transacts the business in the manual window/the network window, the user finishes shooting and uploading the business and submits the business. The system firstly classifies the essential picture and judges whether the essential picture category accords with the current business process. And if the condition picture is absent, the verification result is referred to and suggested as failed.

2) After the group of the essential picture meets the acceptance requirement and the acceptance authority user clicks and accepts, intelligently checking essential fields, judging whether essential information is correct or not and whether the essential information meets the auditing standard or not, prompting whether the reference opinion is approved or not, and judging whether the case is approved or not by an auditor; and completing case handling through verification.

Specifically, the image classifier is used for classifying and checking; the paper image classification method based on deep learning image target detection comprises the following steps:

1) acquisition and labeling of training image data: the image acquired by the terminal image acquisition equipment is uploaded to an image processing server, the preliminary operation is to manually screen and classify and label the image, the label format of the label is txt text file, the label file names correspond to the image file names one by one, and the label content is generated by the label of an open source tool LabelImg.

2) Performing K-means clustering on the initially screened original image data: the clustered initial anchor frame is used for replacing the YOLO-v3 initial anchor frame, and the method can remarkably improve the regression accuracy of the small target detection result frame.

The modified K-means clustering algorithm applicable to the anchor frame comprises the following steps: (Note: all the formula characters in the following are consistent with the preceding text)

i. Data represent: presetting the width and height sizes of 9 clusters:

S₀＝{cw₁,ch₁,cw₂,ch₂,......cw_n,ch_n},n＝9,

D＝{w₁,h₁,w₂,h₂,......w_n,h_n},n＝Size of Dataset.

distance calculation: calculating Euclidean distance from each data to any cluster in data set by taking cluster width and height S as center

d＝(d_wj,d_hj),

i＝{1,2,3......,9},j＝{1,2,3......,Size of Dataset}.

i＝{1,2,3......,9},n＝Size of Each Dataset.

recombined into a new set of clusters

S₁＝{cw₁,ch₁,cw₂,ch₂,......cw_n,ch_n},n＝9.

3) Improvement of the Yolo-v3 network model: the network model structure adopted by the image classification is a Yolo-v3 target detection model (the image classification task is performed by a target detection method, so that the problem that the classification accuracy of the original image classification on paper text pictures is not high is effectively solved, and experiments show that in the task, the local feature detection and identification performance of the target detection is superior to the global feature identification and classification of the image classification). Wherein, the core backbone network-residual error structure of the target detection model structure is updated, the original residual error module is compared with the HS-Block module replacing the 3x3 module as follows, and the attached figure 3 is matched:

original residual module:

Input:x(Tensor)

Output:y(Tensor)

y＝add(conv1×1(conv3×3(conv1×1(x)))，x).

HS-Block Module:

Input:x(Tensor)

v₁,v₂,v₃,v₄,v₅＝split(x),

v₂₁，v₂₂＝split(conv3×3(v2))，

v₃₁，v₃₂＝split(conv3×3(concat(conv3×3(v₃)，v₂₂)))，

v₄₁，v₄₂＝split(conv3×3(concat(conv3×3(v4)，v₃₂)))，

v₅₁＝conv3×3(concat(conv3×3(v₅)，v₄₂)).

Output:y(Tensor)

y＝concat(v₁,v₂₁,v₃₁,v₄₁,v₅₁).

in the above formula, conv1 × 1 and conv3 × 3 represent convolution, split represents a layer-wise average segmentation feature map, add represents bitwise addition, and v is_iRepresenting the profile, concat represents the per-layer connection profile.

As shown above (HS-Block structure referred to from paper HS-ResNet: Hierarchical-Split Block on probabilistic Neural Network), the original residual Block kernel 3x3 convolution is replaced by a HS-Block Hierarchical convolution Block.

The original network and this improved network are trained via the same picture dataset.

Experimental data show that the loss of the original network is stabilized to be about 6.0, while the loss of the improved network is stabilized to be about 4.5, and the loss is improved to a certain extent.

At the hyper-parameter setting, the picture input (608x608) with the largest Yolo-v3 at present is taken as the input of the network model.

Because the mobile terminal runs on the server instead of the terminal mobile device, the requirement on the speed is not very high, so that more resources can be biased to the precision between the balance detection precision and the speed, and a great improvement space is left for the identification precision.

On the preprocessing of the picture, because different terminal image acquisition devices all differ the paper image direction, the illumination intensity, the angle in the image acquisition process, so can do data augmentation to all pictures on the preprocessing, and the concrete operation is: the left and the right are respectively rotated by 90 degrees, the exposure rate is randomly adjusted, and the picture is spliced, zoomed and finely adjusted by an angle (small-angle rotation). Thereby enhancing the generalization ability of the model.

And further image enhancement processing is carried out in the training process.

Before the beginning of the Yolo-v3 model, a pre-trained repetitive channel and space attention model is connected to the input picture vector matrix, and after the attention model is used for amplifying the more obvious regions of the image in a concentrated manner, the more obvious regions of the image are placed into the Yolo-v3 network, so that the detection effect of the network model on the more obvious regions of the image can be improved.

Channel F_a(x) Space F_b(x) The attention module is as follows, and is provided with figures 4, 5 and 6:

Input:x(Tensor)

maxpool,avgpool＝split(x)

F_a(x)＝conv1×1(relu(bn(conv1×1(concat(maxpool，avgpool)))))，

F_b1(x)＝conv1×1(relu(bn(conv1×1(x))))，

F_b2(x)＝conv1×1(relu(bn(conv3×1(conv1×3(x)))))，

F_b(x)＝F_b1(x)*F_b2(x).

Output:y(Tensor)

y＝(Softmax(F_a(x))*x)*Softmax(F_b(x))+x.

4) Improvements in sustainable training: because the workload of manual image data screening is too large, the model trained primarily is directly applied to the image acquisition and screening work, and in the process, the filtering threshold value needs to be reduced, so that the acquisition efficiency of the image data set can be increased to a great extent.

The specific image classifier workflow is shown in fig. 2.

It can be seen from the flowchart that the initial picture is roughly screened by using the model parameters of the first training, then the model parameters are continuously optimized and updated, the initial picture is screened by using the new model parameters, and the hyper-parameters of the Yolo-v3 model are continuously updated and fine-tuned manually in the process. And continuously circulating the process until the accuracy of the image classifier meets the service requirement.

The invention has the following remarkable innovation points:

1) and replacing the initial anchor frame of YOLO-v3 with the initial anchor frame after clustering, and setting the size of the anchor frame according to different data sets.

2) The adopted network model structure is a YOLO-v3 target detection model, and an image classification task is performed by a target detection method. The 3x3 convolution of the core backbone network-residual Block of the model structure can be replaced by either HS-Block or Ghost network structures.

3) In the preprocessing, data enhancement is performed on all pictures, and the specific operations are as follows: the left and the right are respectively rotated by 90 degrees, the random adjustment of the exposure rate is carried out, the picture is spliced, zoomed and slightly rotated (small-angle rotation), and 608x608x3 is adopted as model input in the super-parameter setting.

4) Before the Yolo-v3 model starts, a mechanism of attention (spatial and channel attention) is engaged to the input picture vector matrix.

And directly applying the preliminarily trained model to the acquisition work of the picture, and reducing the filtering threshold value of the classifier from 0.6 in the formal production environment to 0.3.

The above description is only a part of specific embodiments of the present invention (since the embodiments of the present invention are not exhaustive, the scope of the present invention is defined by the description scope of the present invention and other technical points), and the detailed contents or common general knowledge known in the schemes are not described herein too much. It should be noted that the above-mentioned embodiments do not limit the present invention in any way, and all technical solutions obtained by means of equivalent substitution or equivalent transformation for those skilled in the art are within the protection scope of the present invention. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims

1. A verification and approval method based on artificial intelligence image recognition is characterized by comprising the following steps:

2. The verification and approval method according to claim 1, wherein the classification and intelligent verification method is a paper image classification method based on deep learning image target detection, and specifically comprises the following steps: 1) acquisition and labeling of training image data: the image acquired by the terminal image acquisition equipment is uploaded to an image processing server, the preliminary operation is to manually screen and classify and label the image, and the label file names correspond to the image file names one by one.

3. The validation and approval method of claim 2 wherein said label is in the format of a txt text file.

4. The validation approval method of claim 2 wherein the contents of the label are generated by the open source tool LabelImg label.

5. The validation and approval method of claim 2, wherein said classification, intelligent validation method comprises the steps of: 2) performing K-means clustering on the initially screened original image data: replacing the initial anchor frame of YOLO-v3 with the initial anchor frame of the clustered anchor;

i. data represent: presetting the width and height sizes of 9 clusters:

S₀＝{cw₁,ch₁,cw₂,ch₂,......cw_n,ch_n},n＝9,

wherein cw_i,ch_iRespectively, the initial width and height, S₀Is a set with initial width and height, subscript represents the current clustering times, and n represents the total number; the following is a broad-high representation of all data sets, D being the set of data, w_i,h_iIs the width and height in the data, n represents the total:

D＝[w₁,h₁,w₂,h₂,......w_n,h_n},n＝Size of Dataset.

d＝(d_wj,d_hj),

i＝{1,2,3......,9},j＝{1,2,3......,Size of Dataset}.

i＝{1,2,3......,9},n＝Size of Each Dataset.

S₁＝{cw₁,ch₁,cw₂,ch₂,......cw_n,ch_n},n＝9.

6. The validation and approval method of claim 5, wherein said classification, intelligent validation method comprises the steps of: 3) improvement of the Yolo-v3 network model: the network model structure adopted by the image classification is a Yolo-v3 target detection model;

the 3X3 convolution operation of replacing the residual structure by an HS-Block module is as follows:

HS-Block Module:

Input:x(Tensor)

v₁,v₂,v₃,v₄,v₅＝split(x),

v₂₁,v₂₂＝split(conv3×3(v2)),

v₃₁,v₃₂＝split(conv3×3(concat(conv3×3(v₃),v₂₂))),

v₄₁,v₄₂＝split(conv3×3(concat(conv3×3(v4),v₃₂))),

v₅₁＝conv3×3(concat(conv3×3(v₅),v₄₂)).

Output:y(Tensor)

y＝concat(v₁,v₂₁,v₃₁,v₄₁,v₅₁).

in the above formula, conv1 × 1 and conv3 × 3 represent convolution, split represents a feature map divided by layer average, and v represents_iRepresenting a feature map, concat representing a per-layer connection feature map;

7. The validation and approval method of claim 6, wherein said classification, intelligent validation method comprises the steps of: 3) performing image enhancement processing in a training process:

channel F_a(x) Space F_b(x) The attention module is as follows:

Input:x(Tensor)

maxpool,avgpool＝split(x)

F_a(x)＝conv1×1(relu(bn(conv1×1(concat(maxpool,avgpool))))),

F_b1(x)＝conv1×1(relu(bn(conv1×1(x)))),

F_b2(x)＝conv1×1(relu(bn(conv3×1(conv1×3(x))))),

F_b(x)＝F_b1(x)*F_b2(x).

Output:y(Tensor)

y＝(Softmax(F_a(x))*x)*Softmax(F_b(x))+x.

8. The validation and approval method of claim 6 or 7 wherein said classification, intelligent validation method comprises the steps of: 4) improvements in sustainable training: the preliminarily trained model is directly applied to the image acquisition and screening work, and in the process, the filtering threshold value is reduced, and the acquisition efficiency of the image data set is increased; roughly screening the original picture by using the model parameters of the first training, continuously optimizing and updating the model parameters, screening the original picture by using the new model parameters, and continuously updating and finely adjusting the hyper-parameters of the Yolo-v3 model by manpower in the process; and continuously circulating the process until the accuracy of the image classifier meets the service requirement.