CN115984853A - Character recognition method and device - Google Patents

Character recognition method and device Download PDF

Info

Publication number
CN115984853A
CN115984853A CN202310016446.2A CN202310016446A CN115984853A CN 115984853 A CN115984853 A CN 115984853A CN 202310016446 A CN202310016446 A CN 202310016446A CN 115984853 A CN115984853 A CN 115984853A
Authority
CN
China
Prior art keywords
character
prediction
line
character image
line character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310016446.2A
Other languages
Chinese (zh)
Inventor
雷鹤芳
陈阳阳
孙斌华
邹亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310016446.2A priority Critical patent/CN115984853A/en
Publication of CN115984853A publication Critical patent/CN115984853A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The application provides a character recognition method and a character recognition device, which can be used in the technical field of artificial intelligence, and the method comprises the following steps: acquiring a target multi-line character image; determining a plurality of prediction frame positions in the target multi-line character image according to the target multi-line character image and a preset positioning model, wherein the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images; determining the corresponding score of each prediction frame according to a preset weight matrix and the position of each prediction frame; and transversely splicing the prediction frames based on the scores corresponding to the prediction frames to obtain a new character image, and performing character recognition on the new character image to obtain a character recognition result of the target multi-line character image. According to the method and the device, the accuracy and the efficiency of character recognition can be improved, and then the business handling experience of a client can be improved.

Description

Character recognition method and device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a character recognition method and device.
Background
At present, the content of the input, the verification and the verification of bank counter business certificates is more, the coverage is wide, and the workload is large. When the content of one element is large in the service certificate, factors such as non-standard sequence of the client handwriting area, different printing habits of each network point on content formats and the like exist, so that a series of processes of inputting, checking and auditing by service personnel are large in workload, the overall service processing is slow in timeliness, and the experience of the client in handling services is reduced.
With the advent of the intelligent era, the model trained by positioning and identifying stock data in artificial intelligence can be used for identifying characters with more standard typesetting or independent contents. When the content identification is carried out on the content with the non-standard format and the required identification sequence, the condition that only part of the content is identified or the sequence of the content is identified is wrong can occur, and the workload of bank business personnel can not be effectively released. How to quickly respond to the requirement of intelligent identification of counter service certificates, realize detection and identification of irregular multi-line characters at high speed, reduce the labor cost of services, and urgently need a detection method capable of solving the problem of irregular multi-line character contents.
Disclosure of Invention
Aiming at least one problem in the prior art, the application provides a character recognition method and device, which can improve the accuracy and efficiency of character recognition and further improve the business handling experience of customers.
In order to solve the technical problem, the present application provides the following technical solutions:
acquiring a target multi-line character image;
determining a plurality of prediction frame positions in the target multi-line character image according to the target multi-line character image and a preset positioning model, wherein the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images;
determining the corresponding score of each prediction frame according to a preset weight matrix and the position of each prediction frame;
and transversely splicing all the prediction frames based on the scores corresponding to all the prediction frames to obtain a new character image, and performing character recognition on the new character image to obtain a character recognition result of the target multi-line character image.
In one embodiment, the text recognition method further includes:
acquiring batch historical multi-line character images and corresponding prediction frame position labels thereof;
and training a target detection algorithm by applying batch historical multi-line character images and the corresponding prediction frame position labels thereof to obtain the preset positioning model.
In one embodiment, the determining the score corresponding to each prediction frame according to the preset weight matrix and the position of each prediction frame includes:
constructing a position matrix based on the positions of the prediction frames;
multiplying the preset weight matrix and the position matrix to obtain a fraction matrix, wherein the fraction matrix comprises: scores corresponding to the prediction boxes.
In one embodiment, the transversely splicing the prediction frames based on the scores corresponding to the prediction frames to obtain a new text image includes:
and sequencing the prediction frames from large to small based on the corresponding scores of the prediction frames, and transversely splicing the prediction frames based on the sequencing result to obtain a new character image.
In one embodiment, before the determining the plurality of predicted frame positions in the target multi-line character image according to the target multi-line character image and a preset positioning model, the method further includes:
acquiring batch historical multi-line character images and corresponding character labels thereof;
performing a verification step, the verification step comprising: applying the batch historical multi-line character images and the preset positioning model to obtain the position of a prediction frame of each historical multi-line character image; obtaining new character images corresponding to the historical multi-line character images according to a preset weight matrix and the prediction frame positions of the historical multi-line character images; carrying out character recognition on each new character image to obtain a character recognition result of each historical multi-line character image; and obtaining the identification accuracy rate based on the character identification result and the character label of each historical multi-line character image, and if the identification accuracy rate is greater than an accuracy rate threshold value, determining that the preset positioning model passes verification.
In one embodiment, the text recognition method further includes:
if the identification accuracy is smaller than or equal to an accuracy threshold, determining that the preset positioning model fails to be verified;
updating the number of training rounds of the target detection algorithm;
training a target detection algorithm based on the updated training round number, batch historical multi-line character images and the corresponding prediction frame position labels to obtain a retrained positioning model;
and executing the verification step again by using the retrained positioning model until the preset positioning model passes the verification.
In one embodiment, the performing character recognition on the new character image to obtain a character recognition result of the target multiple lines of character images includes:
applying a preset character recognition model and the new character image to obtain a character recognition result of the target multi-line character image;
the preset character recognition model is obtained by pre-training a convolution cyclic neural network model based on batch character images and character labels corresponding to the batch character images.
In a second aspect, the present application provides a character recognition apparatus, comprising:
the acquisition module is used for acquiring a target multi-line character image;
the positioning module is used for determining the positions of a plurality of prediction frames in the target multi-line character image according to the target multi-line character image and a preset positioning model, and the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images;
the determining module is used for determining the corresponding scores of the prediction frames according to the preset weight matrix and the positions of the prediction frames;
and the character recognition module is used for transversely splicing the prediction frames based on the scores corresponding to the prediction frames to obtain new character images, and performing character recognition on the new character images to obtain character recognition results of the target multi-line character images.
In one embodiment, the text recognition apparatus further includes:
the first historical data acquisition module is used for acquiring a batch of historical multi-line character images and corresponding prediction frame position labels thereof;
and the training module is used for training a target detection algorithm by applying batch historical multi-line character images and the corresponding prediction box position labels thereof to obtain the preset positioning model.
In one embodiment, the determining module comprises:
the construction unit is used for constructing and obtaining a position matrix based on the positions of the prediction frames;
a fraction matrix determining unit, configured to multiply the preset weight matrix and the position matrix to obtain a fraction matrix, where the fraction matrix includes: scores corresponding to the prediction boxes.
In one embodiment, the word recognition module comprises:
and the splicing unit is used for sequencing the prediction frames from large to small based on the scores corresponding to the prediction frames, and transversely splicing the prediction frames based on the sequencing result to obtain a new character image.
In one embodiment, the text recognition apparatus further comprises:
the second historical data acquisition module is used for acquiring a batch of historical multi-line character images and character labels corresponding to the historical multi-line character images;
a first verification module for performing a verification step, the verification step comprising: applying the batch historical multi-line character images and the preset positioning model to obtain the position of a prediction frame of each historical multi-line character image; obtaining new character images corresponding to the historical multi-line character images according to a preset weight matrix and the prediction frame positions of the historical multi-line character images; carrying out character recognition on each new character image to obtain a character recognition result of each historical multi-line character image; and obtaining the identification accuracy rate based on the character identification result and the character label of each historical multi-line character image, and if the identification accuracy rate is greater than an accuracy rate threshold value, determining that the preset positioning model passes verification.
In one embodiment, the text recognition apparatus further includes:
the second verification module is used for determining that the preset positioning model fails to verify if the identification accuracy is smaller than or equal to an accuracy threshold;
the updating module is used for updating the number of training rounds of the target detection algorithm;
the retraining module is used for training the target detection algorithm based on the updated training round number, batch historical multi-line character images and the corresponding prediction frame position labels thereof to obtain a retrained positioning model;
and the secondary verification module is used for applying the retrained positioning model to perform the verification step again until the preset positioning model passes the verification.
In one embodiment, the word recognition module comprises:
the character recognition unit is used for applying a preset character recognition model and the new character image to obtain a character recognition result of the target multi-line character image;
the preset character recognition model is obtained by pre-training a convolution cycle neural network model based on batch character images and character labels corresponding to the batch character images.
In a third aspect, the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the text recognition method when executing the program.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon computer instructions that, when executed, implement the word recognition method.
According to the technical scheme, the application provides a character recognition method and device. Wherein, the method comprises the following steps: acquiring a target multi-line character image; determining a plurality of prediction frame positions in the target multi-line character image according to the target multi-line character image and a preset positioning model, wherein the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images; determining the corresponding score of each prediction frame according to a preset weight matrix and the position of each prediction frame; transversely splicing the prediction frames based on the scores corresponding to the prediction frames to obtain a new character image, and performing character recognition on the new character image to obtain a character recognition result of the target multi-line character image, so that the accuracy and the efficiency of character recognition can be improved, and the business handling experience of customers can be improved; specifically, position detection is carried out on irregular multiline characters in the service voucher through a series of calculations of YOLO, the sequence of horizontal splicing of all fragments is determined by combining the constructed position weighted values, the problem of identification of the irregular multiline characters can be effectively solved, intelligent identification is accelerated to replace an artificial process, service handling timeliness is optimized, and experience of a client in handling services is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating a text recognition method according to an embodiment of the present application;
FIG. 2 is a schematic illustration of a plurality of lines of text image segments with irregular text sequences according to an example of the present application;
FIG. 3 is a schematic illustration of a plurality of lines of text image segments with irregular text sequences according to another example of the present application;
FIG. 4 is a fragmentary view of a plurality of lines of text images in a normal text sequence in one example of the present application;
FIG. 5 is a schematic flowchart of steps 011 and 012 of the character recognition method in the embodiment of the present application;
FIG. 6 is a flow chart illustrating steps 301 and 302 of a text recognition method in an embodiment of the present application;
FIG. 7 is a schematic diagram of a new text image in one example of the present application;
FIG. 8 is a flowchart illustrating steps 101 to 105 of a character recognition method according to an exemplary application of the present application;
FIG. 9 is a flow chart illustrating steps 21 to 24 of a character recognition method according to an exemplary embodiment of the present application;
FIG. 10 is a flow chart illustrating steps 31 to 34 of a character recognition method in an application example of the present application;
FIG. 11 is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present application;
fig. 12 is a schematic block diagram of a system configuration of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
In order to solve the problems in the prior art, the application considers that a plurality of lines of character images are sent upwards; determining the position of a prediction frame of each text field in the multi-line text image by applying a yolo positioning model obtained by pre-training; determining the weight value of the prediction box corresponding to each character by using the position weight; transversely splicing the prediction frames to obtain a new graph with all characters in one line; and obtaining a character recognition result of the target image based on the new image and the character recognition model. Therefore, the problem of recognizing irregular multi-line characters is solved.
Based on this, in order to improve accuracy and efficiency of character recognition and further improve business handling experience of a client, an embodiment of the present application provides a character recognition apparatus, where the apparatus may be a server or a client device, and the client device may include a smart phone, a tablet electronic device, a network set-top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), a vehicle-mounted device, an intelligent wearable device, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch and intelligent bracelet etc..
In practical applications, the text recognition part may be executed on the server side as described above, or all operations may be completed in the client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. The client device may further include a processor if all operations are performed in the client device.
The client device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.
The server and the client device may communicate using any suitable network protocol, including network protocols not yet developed at the filing date of this application. The network protocol may include, for example, a TCP/IP protocol, a UDP/IP protocol, an HTTP protocol, an HTTPS protocol, or the like. Of course, the network Protocol may also include, for example, an RPC Protocol (Remote Procedure Call Protocol), a REST Protocol (Representational State Transfer Protocol), and the like used above the above Protocol.
It should be noted that the character recognition method and apparatus disclosed in the present application can be used in the field of financial technology, and can also be used in any field other than the field of financial technology.
The following examples are intended to illustrate the details.
In order to improve accuracy and efficiency of character recognition and further improve business handling experience of a client, the embodiment provides a character recognition method in which an execution main body is a character recognition device, the character recognition device includes but is not limited to a server, and as shown in fig. 1, the method specifically includes the following contents:
step 100: and acquiring a target multi-line character image.
Specifically, the target multi-line text image may be an image corresponding to a bank counter business certificate, and a problem of irregular text sequence may exist; fig. 2 is a schematic diagram of a multi-line text image segment with an irregular text sequence in an example, fig. 3 is a schematic diagram of a multi-line text image segment with an irregular text sequence in another example, fig. 4 is a schematic diagram of a multi-line text image segment with a normal text sequence in an example, x in fig. 2 to 4 represents text, a dashed box represents a test box, and numbers represent a correct text sequence between the test boxes.
Step 200: and determining a plurality of prediction frame positions in the target multi-line character image according to the target multi-line character image and a preset positioning model, wherein the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images.
Specifically, the prediction frame position may represent coordinate information of the prediction frame in the target multi-line character image; the target detection algorithm may be the YOLO algorithm You Only Look one. The YOLO algorithm, the You Only Look one, is a high-efficiency real-time target detection algorithm, and the core idea is to input a picture to be detected into a convolutional network and output a plurality of tensors at a full connection layer, wherein each tensor comprises a target probability P O The target result is screened out by an Intersection over Union (IoU) and a Non-maximum suppression method (NMS) of the Intersection over Union and the real labeling box (GT).
Step 300: and determining the corresponding score of each prediction frame according to the preset weight matrix and the position of each prediction frame.
Step 400: and transversely splicing all the prediction frames based on the scores corresponding to all the prediction frames to obtain a new character image, and performing character recognition on the new character image to obtain a character recognition result of the target multi-line character image.
Specifically, the new character image is a character image obtained by adjusting characters in the target multi-line character image, wherein the characters have irregular sequence, into characters with normal sequence. The character recognition result of the new character image may be equivalent to the character recognition result of the target multi-line character image.
Specifically, irregular lines of characters can be used as training samples, each line is used as an area label, a GT is constructed, and a positioning model is obtained through preset training rounds and tensor fitting. According to the position information obtained by the positioning model, a position matrix Ml is constructed (i,j) And a predetermined weight matrix Mw (j,1) And calculating to obtain a fractional matrix Ms (i,1) Therefore, the splicing sequence of the fragment characters is obtained, and finally, the content of the characters can be accurately identified by using a common identification model.
In order to further improve the reliability of the positioning model, and further improve the accuracy of the character recognition by applying the reliable positioning model, as shown in fig. 5, in an embodiment, before step 200, the method further includes:
a step 011: and acquiring batch historical multi-line character images and corresponding prediction frame position labels thereof.
Step 012: and training a target detection algorithm by applying batch historical multi-line character images and the position labels of the prediction frames corresponding to the batch historical multi-line character images to obtain the preset positioning model.
To further improve the accuracy of determining the respective prediction box scores, as shown in FIG. 6, in one embodiment, step 300 includes:
step 301: and constructing and obtaining a position matrix based on the positions of the prediction frames.
Specifically, for any one prediction box, assuming that the abscissa of the upper left point is x, the ordinate is y, the prediction box width is w, and the prediction box height is h, the abscissa of the lower right point of the prediction box is ex = x + w, and the ordinate of the lower right point is ey = y + h.
Step 302: multiplying the preset weight matrix and the position matrix to obtain a fraction matrix, wherein the fraction matrix comprises: scores corresponding to the respective prediction boxes.
Specifically, the preset weight matrix may be an n-dimensional vector, and the number of the prediction frames is n.
In one example, assume that the number of prediction frames in the target multiline text image is 4, and the number of prediction frames is 4 respectively C1C2C3C4 Then the position matrix is:
Figure BDA0004040677170000081
wherein x is C1 Representation prediction box C1 Abscissa of the upper left point of (a), y C1 Representation prediction box C1 Ordinate of the upper left point of (a), ex C1 Representation prediction box C1 The abscissa, ey, of the lower right point of C1 Representation prediction box C1 The ordinate of the lower right point of (a). One of the prediction boxes can be randomly selected as C1 Randomly selecting one of the remaining prediction boxes as C2 And so on ...
The preset weight matrix is:
Figure BDA0004040677170000091
length L and height H of the target multi-line text image. When L > H, w x ∈(0,0.4],w x Can be set according to actual conditions, w y =1-w x ,λ 1 =w C1 +w C2 +w C3 +w C4
Figure BDA0004040677170000092
w C1 To w C4 Representing prediction blocks in sequence C1 To is that C4 Width of (d), w x And w y Indicating a preset weight value, λ, for the upper left point 1 And λ 2 Representing a preset weight value of a lower right point; when H > L, w x ∈(0.4,1],w y =1-w x
Figure BDA0004040677170000093
λ 2 =h C1 +h C2 +h C3 +h C4 ,h C1 To h C4 Representing prediction blocks in sequence C1 To C4 Of (c) is measured.
According to Ms (i,1) =Ml (i,j) ×Mw (j,1) The derivative of this example is available, where score ci Representation prediction box Ci The fraction of (c):
Figure BDA0004040677170000094
to improve the accuracy of the new text image, in one embodiment, step 400 includes: and sequencing the prediction frames from large to small based on the corresponding scores of the prediction frames, and transversely splicing the prediction frames based on the sequencing result to obtain a new character image.
Specifically, the horizontal splicing may refer to arranging the first word of one prediction box after the last word of the previous prediction box, and so on until the splicing of each prediction box is completed. In one example, the new text image obtained after the horizontal stitching is shown in fig. 7, and the numbers indicate the correct text sequence between the test boxes.
To further improve the reliability of the positioning model, in an embodiment, before step 200, the method further includes:
step 021: and acquiring batch historical multi-line character images and character labels corresponding to the batch historical multi-line character images.
Step 022: performing a verification step, the verification step comprising: applying the batch historical multi-line character images and the preset positioning model to obtain the position of a prediction frame of each historical multi-line character image; obtaining new character images corresponding to the historical multi-line character images according to a preset weight matrix and the prediction frame positions of the historical multi-line character images; carrying out character recognition on each new character image to obtain a character recognition result of each historical multi-line character image; and obtaining the identification accuracy rate based on the character identification result and the character label of each historical multi-line character image, and if the identification accuracy rate is greater than an accuracy rate threshold value, determining that the preset positioning model passes verification.
In order to further improve the reliability of the positioning model, in an embodiment, the text recognition method further includes:
step 023: and if the identification accuracy is less than or equal to an accuracy threshold, determining that the preset positioning model fails to be verified.
And 024: and updating the number of training rounds of the target detection algorithm.
Specifically, the number of training rounds of the target detection algorithm may be gradually increased according to a certain rule, for example, each time the number of training rounds is updated, the number is increased by 50 times, and the like.
Step 025: and training the target detection algorithm based on the updated training round number, the batch historical multi-line character images and the prediction frame position labels corresponding to the batch historical multi-line character images to obtain a retrained positioning model.
Step 026: and executing the verification step again by applying the retrained positioning model until the preset positioning model passes the verification.
To further improve the reliability of the text recognition, in one embodiment, step 400 includes:
step 401: and applying a preset character recognition model and the new character image to obtain a character recognition result of the target multi-line character image.
Step 402: the preset character recognition model is obtained by pre-training a convolution cyclic neural network model based on batch character images and character labels corresponding to the batch character images.
In particular, the convolutional recurrent neural network model may be a CRNN model. The existing character recognition method can also be applied to perform character recognition on the new character image.
To further illustrate the present solution, the present application provides an application example of a text detection method, including: determining a modeling scene, selecting a training sample, processing a YOLO algorithm, constructing a position weight value, and evaluating an effect, as shown in fig. 8, which is specifically described as follows:
step 101, determining a modeling scene:
irregular lines of text are caused by writing and printing habits of different customers or business groups. For different habits, the characters can be recognized at a glance according to daily cognition of people, so that the character recognition has a certain rule, position information can be given through YOLO, and accurate recognition can be carried out on irregular multi-line characters by combining sequential splicing of weight scores.
102, training sample selection:
and extracting irregular lines of character fragment graphs from the historical counter business voucher to serve as training sample data. And after marking the data, putting the sample data and the marked data in a newly established data set DSyw.
And 103. A YOLO algorithm processing step. As shown in fig. 9, step 103 includes:
step 21, setting label classification C = (C) according to historical samples and service providing information, wherein the data set has n types in total 1 ,C 2 ,...,C n ). N of the C classification does not represent an order, but merely a distinction of categories.
And step 22, dividing the input irregular multi-line text picture into S multiplied by S unit cells, and if the central point of a certain block of characters is in a certain unit cell, taking charge of the character object by the unit cell. Taking the left vertex of the cell as an initial coordinate (0, 0), taking x, y, w and h (x and y are the left vertex coordinates of the bbox, and w and h are the width and the height of the bbox) of a certain character object bbox as a reference system value, and obtaining the label of the cell
Figure BDA0004040677170000111
Then the network output dimension is S × S × (B × 5+ C) where B (bbox predictor) defaults to 2,5 for x, y, w, h, P O . If the cell contains the target word, P O =1, get coordinate b x 、b y 、b h 、b w A certain block of characters C i =1, other tag C j =0 (j ≠ i); if not, then P O =0, the latter values need not be understood.
Step 23, calculating confidence coefficient delta according to bbox and GT Preparing I.e. by
Figure BDA0004040677170000112
Then delta Preparation of =P o X IoU. Whereby each bbox has a delta Preparation of The non-maximum suppression NMS is used for filtering each label classification to obtain a best prediction result. According to the loss calculation method given in yolo paper, the coordinate loss of bbox and GT comprising the predicted result:
Figure BDA0004040677170000121
and delta GT (wherein δ GT Confidence loss of = 1):
Figure BDA0004040677170000122
C n class loss from true classification:
Figure BDA0004040677170000123
after the three calculations are added, the neural network of yolo is updated again.
And 24, continuously adjusting and optimizing through round number training, and finally outputting the yolo positioning model.
It is particularly noted that the C classification of yolo output is also unordered.
And 104, constructing a position weight value processing step. As shown in fig. 10, step 104 includes:
and 31, performing node conversion according to the position information predicted by the trained yolo model to obtain the position information of the upper left point and the lower right point of the prediction frame: x = x, y = y, w = w, h = h, ex = x + w, ey = y + h (where ex, ey are lower right point position information of the prediction box).
Step 32. For example with a simple 2-row by 2-column text structure, the sequential link may be left-right-left-right or left-right-right. Constructing a position matrix Ml according to the position information of 4 categories (i,j)
Then
Figure BDA0004040677170000124
Where C1-C4 represent 4 classes, x and y represent the top left points of the prediction boxes for each class, and ex and ey represent the bottom right points of the prediction boxes for each class.
Step 33, presetting the weight matrix as follows:
Figure BDA0004040677170000131
detecting irregular multiline textLength L and height H of the image. When L > H, w x ∈(0,0.4],w y =1-w x ,λ 1 =w C1 +w C2 +w C3 +w C4 ,/>
Figure BDA0004040677170000132
When H > L, w x ∈(0.4,1],w y =1-w x ,/>
Figure BDA0004040677170000133
λ 2 =h C1 +h C2 +h C3 +h C4 . Wherein w represents the preset weight value of the upper left point, λ represents the preset weight value of the lower right point, C1-C4 represent 4 categories, and x and y represent the upper left point.
Step 34, calculating the fraction matrix according to Ms (i,1) =Ml (i,j) ×Mw (j,1) Derived from the present case, where score ci Score of 4 prediction boxes representing 2 rows × 2 columns:
Figure BDA0004040677170000134
step 105, effect evaluation:
taking a new verification sample (left-right-left-right format), setting a comparison verification sample (left-right-right format), and obtaining a fractional matrix Ms according to the above process (i,1) And matching the corresponding bbox according to the size of the fraction in a descending order, cutting out a prediction frame, performing horizontal splicing, finally outputting a new image, calling other stock universal character recognition models for recognition, and verifying whether the recognition content is matched with the label value in the business handling. And if the model prediction accuracy reaches the expectation, the model deployment and application can be carried out. If the estimated model effect is not in accordance with the expected effect, the number of yolo training rounds and the preset weight matrix Mw are adjusted (j,1) And carrying out model training and effect verification again until the effect is in accordance with the expectation.
In terms of software, in order to improve accuracy and efficiency of character recognition and further improve business handling experience of a client, the application provides an embodiment of a character recognition device for implementing all or part of contents in the character recognition method, and referring to fig. 11, the character recognition device specifically includes the following contents:
the acquisition module 01 is used for acquiring a target multi-line character image;
a positioning module 02, configured to determine positions of multiple prediction frames in the target multi-line image according to the target multi-line character image and a preset positioning model, where the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images;
the determining module 03 is configured to determine a score corresponding to each prediction frame according to a preset weight matrix and the position of each prediction frame;
and the character recognition module 04 is used for transversely splicing the prediction frames based on the scores corresponding to the prediction frames to obtain new character images, and performing character recognition on the new character images to obtain character recognition results of the target multi-line character images.
In one embodiment, the text recognition apparatus further includes:
the first historical data acquisition module is used for acquiring a batch of historical multi-line character images and corresponding prediction frame position labels thereof;
and the training module is used for applying the batch historical multi-line character images and the prediction frame position labels corresponding to the batch historical multi-line character images to train a target detection algorithm to obtain the preset positioning model.
In one embodiment, the determining module comprises:
the construction unit is used for constructing and obtaining a position matrix based on the positions of the prediction frames;
a fraction matrix determining unit, configured to multiply the preset weight matrix and the position matrix to obtain a fraction matrix, where the fraction matrix includes: scores corresponding to the prediction boxes.
In one embodiment, the text recognition module comprises:
and the splicing unit is used for sequencing the prediction frames from large to small based on the scores corresponding to the prediction frames, and transversely splicing the prediction frames based on the sequencing result to obtain a new character image.
In one embodiment, the text recognition device further comprises:
the second historical data acquisition module is used for acquiring a batch of historical multi-line character images and character labels corresponding to the historical multi-line character images;
a first verification module for performing a verification step, the verification step comprising: applying the batch historical multi-line character images and the preset positioning model to obtain the position of a prediction frame of each historical multi-line character image; obtaining new character images corresponding to the historical multi-line character images according to a preset weight matrix and the prediction frame positions of the historical multi-line character images; carrying out character recognition on each new character image to obtain a character recognition result of each historical multi-line character image; and obtaining the identification accuracy rate based on the character identification result and the character label of each historical multi-line character image, and if the identification accuracy rate is greater than an accuracy rate threshold value, determining that the preset positioning model passes verification.
In one embodiment, the text recognition apparatus further includes:
the second verification module is used for determining that the preset positioning model fails to verify if the identification accuracy is smaller than or equal to an accuracy threshold;
the updating module is used for updating the number of training rounds of the target detection algorithm;
the retraining module is used for training the target detection algorithm based on the updated training round number, batch historical multi-line character images and the corresponding prediction frame position labels thereof to obtain a retrained positioning model;
and the secondary verification module is used for applying the retrained positioning model to perform the verification step again until the preset positioning model passes the verification.
In one embodiment, the word recognition module comprises:
the character recognition unit is used for applying a preset character recognition model and the new character image to obtain a character recognition result of the target multi-line character image;
the preset character recognition model is obtained by pre-training a convolution cycle neural network model based on batch character images and character labels corresponding to the batch character images.
The embodiments of the text recognition apparatus provided in this specification may be specifically used for executing the processing flow of the embodiments of the text recognition method, and the functions of the text recognition apparatus are not described herein again, and reference may be made to the detailed description of the embodiments of the text recognition method.
In terms of hardware, in order to improve accuracy and efficiency of character recognition and further improve business handling experience of a client, the application provides an embodiment of an electronic device for implementing all or part of contents in the character recognition method, where the electronic device specifically includes the following contents:
a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between the character recognition device and related equipment such as a user terminal; the electronic device may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the electronic device may be implemented with reference to the embodiment for implementing the text recognition method and the embodiment for implementing the text recognition apparatus in the embodiments, and the contents thereof are incorporated herein, and repeated descriptions are omitted here.
Fig. 12 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 12, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this FIG. 12 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one or more embodiments of the present application, the text recognition functionality can be integrated into the central processor 9100. The central processor 9100 may be configured to control as follows:
step 100: and acquiring a target multi-line character image.
Step 200: and determining the positions of a plurality of prediction frames in the target multi-line character image according to the target multi-line character image and a preset positioning model, wherein the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images.
Step 300: and determining the corresponding score of each prediction frame according to the preset weight matrix and the position of each prediction frame.
Step 400: and transversely splicing all the prediction frames based on the scores corresponding to all the prediction frames to obtain a new character image, and performing character recognition on the new character image to obtain a character recognition result of the target multi-line character image.
From the above description, the electronic device provided in the embodiment of the application can improve accuracy and efficiency of character recognition, and further improve business handling experience of a client.
In another embodiment, the word recognition device may be configured separately from the central processor 9100, for example, the word recognition device may be configured as a chip connected to the central processor 9100, and the word recognition function is realized under the control of the central processor.
As shown in fig. 12, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 12; further, the electronic device 9600 may further include components not shown in fig. 12, which can be referred to in the related art.
As shown in fig. 12, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.
The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 9140 can be a solid state memory, e.g., read Only Memory (ROM), random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.
The memory 9140 can also include a data store 9143, the data store 9143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.
According to the description, the electronic equipment provided by the embodiment of the application can improve the accuracy and efficiency of character recognition, and further improve the business handling experience of customers.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all the steps in the character recognition method in the foregoing embodiment, where the computer-readable storage medium stores a computer program, and the computer program implements all the steps of the character recognition method in the foregoing embodiment when executed by a processor, for example, the processor implements the following steps when executing the computer program:
step 100: and acquiring a target multi-line character image.
Step 200: and determining the positions of a plurality of prediction frames in the target multi-line character image according to the target multi-line character image and a preset positioning model, wherein the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images.
Step 300: and determining the corresponding score of each prediction frame according to the preset weight matrix and the position of each prediction frame.
Step 400: and transversely splicing all the prediction frames based on the scores corresponding to all the prediction frames to obtain a new character image, and performing character recognition on the new character image to obtain a character recognition result of the target multi-line character image.
As can be seen from the above description, the computer-readable storage medium provided in the embodiment of the present application can improve accuracy and efficiency of character recognition, and further improve business handling experience of a client.
In the present application, each embodiment of the method is described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Reference is made to the description of the method embodiments in part.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the present application are explained by applying specific embodiments in the present application, and the description of the above embodiments is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method for recognizing a character, comprising:
acquiring a target multi-line character image;
determining a plurality of prediction frame positions in the target multi-line character image according to the target multi-line character image and a preset positioning model, wherein the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images;
determining the corresponding score of each prediction frame according to a preset weight matrix and the position of each prediction frame;
and transversely splicing all the prediction frames based on the scores corresponding to all the prediction frames to obtain a new character image, and performing character recognition on the new character image to obtain a character recognition result of the target multi-line character image.
2. The character recognition method of claim 1, further comprising:
acquiring batch historical multi-line character images and corresponding prediction frame position labels thereof;
and training a target detection algorithm by applying batch historical multi-line character images and the corresponding prediction frame position labels thereof to obtain the preset positioning model.
3. The method for recognizing words according to claim 1, wherein the determining the score corresponding to each predicted frame according to the preset weight matrix and the position of each predicted frame comprises:
constructing a position matrix based on the positions of the prediction frames;
multiplying the preset weight matrix and the position matrix to obtain a fraction matrix, wherein the fraction matrix comprises: scores corresponding to the prediction boxes.
4. The method of claim 1, wherein the transversely stitching each prediction box based on the score corresponding to each prediction box to obtain a new text image comprises:
and sequencing the prediction frames from large to small based on the corresponding scores of the prediction frames, and transversely splicing the prediction frames based on the sequencing result to obtain a new character image.
5. The method of claim 2, further comprising, prior to said determining a plurality of predicted frame positions in the target multi-line text image based on the target multi-line text image and a predetermined positioning model:
acquiring a batch of historical multi-line character images and corresponding character labels thereof;
performing a verification step, the verification step comprising: applying the batch historical multi-line character images and the preset positioning model to obtain the position of a prediction frame of each historical multi-line character image; obtaining new character images corresponding to the historical multi-line character images according to a preset weight matrix and the prediction frame positions of the historical multi-line character images; carrying out character recognition on each new character image to obtain a character recognition result of each historical multi-line character image; and obtaining the identification accuracy rate based on the character identification result and the character label of each historical multi-line character image, and if the identification accuracy rate is greater than an accuracy rate threshold value, determining that the preset positioning model passes verification.
6. The character recognition method of claim 5, further comprising:
if the identification accuracy is smaller than or equal to an accuracy threshold, determining that the preset positioning model fails to be verified;
updating the number of training rounds of the target detection algorithm;
training a target detection algorithm based on the updated training round number, batch historical multi-line character images and the corresponding prediction frame position labels to obtain a retrained positioning model;
and executing the verification step again by applying the retrained positioning model until the preset positioning model passes the verification.
7. The method of claim 1, wherein the performing character recognition on the new character image to obtain a character recognition result of the target plurality of lines of character images comprises:
applying a preset character recognition model and the new character image to obtain a character recognition result of the target multi-line character image;
the preset character recognition model is obtained by pre-training a convolution cycle neural network model based on batch character images and character labels corresponding to the batch character images.
8. A character recognition apparatus, comprising:
the acquisition module is used for acquiring a target multi-line character image;
the positioning module is used for determining the positions of a plurality of prediction frames in the target multi-line character image according to the target multi-line character image and a preset positioning model, and the preset positioning model is obtained by pre-training a target detection algorithm based on batch historical multi-line character images and prediction frame position labels corresponding to the batch historical multi-line character images;
the determining module is used for determining the corresponding scores of the prediction frames according to the preset weight matrix and the positions of the prediction frames;
and the character recognition module is used for transversely splicing the prediction frames based on the scores corresponding to the prediction frames to obtain new character images, and performing character recognition on the new character images to obtain character recognition results of the target multi-line character images.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of text recognition according to any of claims 1 to 7 when executing the program.
10. A computer-readable storage medium having computer instructions stored thereon which, when executed, implement the text recognition method of any one of claims 1 to 7.
CN202310016446.2A 2023-01-06 2023-01-06 Character recognition method and device Pending CN115984853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310016446.2A CN115984853A (en) 2023-01-06 2023-01-06 Character recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310016446.2A CN115984853A (en) 2023-01-06 2023-01-06 Character recognition method and device

Publications (1)

Publication Number Publication Date
CN115984853A true CN115984853A (en) 2023-04-18

Family

ID=85966565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310016446.2A Pending CN115984853A (en) 2023-01-06 2023-01-06 Character recognition method and device

Country Status (1)

Country Link
CN (1) CN115984853A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079282A (en) * 2023-08-16 2023-11-17 读书郎教育科技有限公司 Intelligent dictionary pen based on image processing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079282A (en) * 2023-08-16 2023-11-17 读书郎教育科技有限公司 Intelligent dictionary pen based on image processing

Similar Documents

Publication Publication Date Title
CN109740018B (en) Method and device for generating video label model
CN113723288B (en) Service data processing method and device based on multi-mode hybrid model
CN112100431B (en) Evaluation method, device and equipment of OCR system and readable storage medium
CN111815169B (en) Service approval parameter configuration method and device
CN111582273A (en) Image text recognition method and device
CN111932267A (en) Enterprise financial service risk prediction method and device
CN109903172A (en) Claims Resolution information extracting method and device, electronic equipment
CN111949795A (en) Work order automatic classification method and device
CN112132030A (en) Video processing method and device, storage medium and electronic equipment
CN112861662A (en) Target object behavior prediction method based on human face and interactive text and related equipment
CN112766825A (en) Enterprise financial service risk prediction method and device
CN114005019B (en) Method for identifying flip image and related equipment thereof
CN115984853A (en) Character recognition method and device
CN112035325A (en) Automatic monitoring method and device for text robot
CN115757725A (en) Question and answer processing method and device, computer equipment and storage medium
CN112101231A (en) Learning behavior monitoring method, terminal, small program and server
CN110008926B (en) Method and device for identifying age
CN111427990A (en) Intelligent examination control system and method assisted by intelligent campus teaching
CN109816023B (en) Method and device for generating picture label model
CN113140012B (en) Image processing method, device, medium and electronic equipment
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3
CN116052195A (en) Document parsing method, device, terminal equipment and computer readable storage medium
CN115700845A (en) Face recognition model training method, face recognition device and related equipment
CN110399615B (en) Transaction risk monitoring method and device
CN110864683B (en) Service handling guiding method and device based on augmented reality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination