CN113807326B

CN113807326B - System table character recognition method and device

Info

Publication number: CN113807326B
Application number: CN202111358601.6A
Authority: CN
Inventors: 孙鹏程; 王潇茵; 杜婉茹; 刘萱; 李瑞群
Original assignee: Aerospace Hongkang Intelligent Technology Beijing Co ltd
Current assignee: Aerospace Hongkang Intelligent Technology Beijing Co ltd
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-02-25
Anticipated expiration: 2041-11-17
Also published as: CN113807326A

Abstract

The invention relates to a system table character recognition method and a device, wherein the recognition method comprises the following steps: obtaining at least one cell image to be identified according to the standard form image to be identified; inputting at least one unit cell image to be recognized into a trained character classifier to obtain a character classification result of each unit cell image to be recognized, and inputting at least one unit cell image to be recognized into a trained text recognizer to obtain a first character recognition result of each unit cell image to be recognized; determining the weight of each character type of each cell image to be recognized; obtaining a second character recognition result of each cell image to be recognized according to the first character recognition result of each cell image to be recognized and the weight of each character type; filling the second character recognition result into a preset blank cell to obtain a cell character recognition result of each cell image to be recognized; and splicing the cell character recognition results into standard form character recognition results.

Description

System table character recognition method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a system form character recognition method and apparatus.

Background

Optical Character recognition (ocr) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into computer text using a Character recognition method. At present, deep learning and OCR are mostly combined, in the related technology, feature learning is performed based on a CNN + RNN network structure, but in continuous indefinite-length texts, the effect of distinguishing letters similar to Chinese characters is not good, and proper classification is difficult to perform for a randomly-composed structure type.

Disclosure of Invention

The present disclosure provides a system form character recognition method and apparatus to solve at least the problems of the related art described above, and may not solve any of the problems described above.

According to a first aspect of the embodiments of the present disclosure, there is provided a system form character recognition method, including: obtaining at least one cell image to be identified according to the form image of the system to be identified, wherein the cell image to be identified is a cell image of a form in the form image of the system to be identified; inputting the at least one unit cell image to be recognized into a trained character classifier to obtain a character classification result of each unit cell image to be recognized in the at least one unit cell image to be recognized, and inputting the at least one unit cell image to be recognized into a trained text recognizer to obtain a first character recognition result of each unit cell image to be recognized; determining the weight of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized; obtaining a second character recognition result of each cell image to be recognized according to the first character recognition result of each cell image to be recognized and the weight of each character type of each cell image to be recognized; filling the second character recognition result into a preset blank cell to obtain a cell character recognition result of each cell image to be recognized; and splicing the cell character recognition results into standard form character recognition results.

Optionally, the character classifier is in a VGG network structure, and the character classifier is trained by the following steps: acquiring a first training data set, wherein the first training data set is at least one cell image in at least one standard form image; inputting the first training data set into the character classifier to obtain a character classification result of each cell image in the at least one cell image; determining a loss value of the character classifier according to the character classification result of each cell image and the real character result of each cell image; and training the character classifier by adjusting the parameters of the character classifier according to the loss value.

Optionally, the character classifier includes a plurality of convolution structures and at least one full-connected layer, which are sequentially arranged, and each convolution structure includes at least one convolution layer and a maximum pooling layer; the inputting the first training data set into the character classifier to obtain a character classification result of each cell image in the at least one cell image includes: inputting the first training data set into a first convolution structure to obtain a first output of each cell image; iteratively performing an operation of inputting the ith output of each cell image into the (i + 1) th convolution structures until i = n-1 from i =1, wherein n is the number of the plurality of convolution structures, and n is greater than 1, to obtain the (i + 1) th output of each cell image; inputting the nth output of each cell image into the at least one full-connection layer for full connection to obtain the (n + 1) th output of each cell image; and obtaining a character classification result of each unit cell image according to the (n + 1) th output of each unit cell image through a softmax function, wherein the at least one convolution layer of each convolution structure performs a convolution operation, and the maximum pooling layer of each convolution structure performs a maximum pooling operation.

Optionally, the text recognizer includes a convolutional neural network and a bidirectional long-and-short memory network, and the text recognizer is trained by the following steps: acquiring a second training data set, wherein the training data set is at least one cell image in at least one standard form image; inputting the second training data set into the convolutional neural network to obtain a feature sequence of each cell image in the at least one cell image; inputting the characteristic sequence of each cell image into the bidirectional long-time and short-time memory network to obtain a second character recognition result of each cell image; determining the conditional probability of the second character recognition result of each cell image according to a CTC algorithm; determining the gradient of the bidirectional long-time and short-time memory network according to the conditional probability; and training the text recognizer by adjusting the parameters of the convolutional neural network and/or the bidirectional long-time memory network according to the gradient.

Optionally, the conditional probability is expressed as:

wherein,

to be composed of

Inputting the bidirectional long-time memory network output

The conditional probability of (a) of (b),

to be composed of

Inputting said bidirectional long-and-short term memory network through

The probability of the path is determined by the probability of the path,

represents all passes

Output after conversion

Of (2) a

，

Is a compression transform.

Optionally, the determining, according to the character classification result of each cell image to be recognized, the weight of each character type of each cell image to be recognized includes: acquiring a weight initial value of each preset character type of each cell image to be recognized, wherein the character types at least comprise a number type, a symbol type, an English type, a Chinese character type and other types; and adjusting the preset weight initial value of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized so as to determine the weight of each character type of each cell image to be recognized, wherein the character classification result at least comprises a numeric character set, a symbolic character set, an English character set, a Chinese character set and other character sets.

Optionally, the obtaining a second text recognition result of each cell image to be recognized according to the first text recognition result of each cell image to be recognized and the weight of each character type of each cell image to be recognized includes: and adjusting the first character recognition result of each cell image to be recognized based on the weight of each character type of each cell image to be recognized, so as to obtain a second character recognition result of each cell image to be recognized, wherein for each cell image to be recognized, the character types with higher weight account for more in the second character recognition result, and the character types with lower weight account for less in the second character recognition result.

According to a second aspect of the embodiments of the present disclosure, there is provided a system form character recognition apparatus, including: a unit image acquisition unit configured to: obtaining at least one cell image to be identified according to the form image of the system to be identified, wherein the cell image to be identified is a cell image of a form in the form image of the system to be identified; a prediction unit configured to: inputting the at least one unit cell image to be recognized into a trained character classifier to obtain a character classification result of each unit cell image to be recognized in the at least one unit cell image to be recognized, and inputting the at least one unit cell image to be recognized into a trained text recognizer to obtain a first character recognition result of each unit cell image to be recognized; a weight acquisition unit configured to: determining the weight of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized; a weighting unit configured to: obtaining a second character recognition result of each cell image to be recognized according to the first character recognition result of each cell image to be recognized and the weight of each character type of each cell image to be recognized; a filling unit configured to: filling the second character recognition result into a preset blank cell to obtain a cell character recognition result of each cell image to be recognized; a stitching unit configured to: and splicing the cell character recognition results into standard form character recognition results.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a system table text recognition method according to the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by at least one processor, cause the at least one processor to perform a system table text recognition method according to the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the standard form character recognition method and device disclosed by the invention, the weight of each character type is determined, the weight is combined into the text recognizer, the text recognition with indefinite length can be carried out, the recognition accuracy is improved, the problem of similarity error of recognition in the CRNN OCR can be avoided as much as possible, and the manual checking cost is reduced.

In addition, according to the standard form character recognition method and device disclosed by the invention, each cell image to be recognized can be recognized, so that the recognition is more accurate.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a flowchart illustrating a system table text recognition method according to an exemplary embodiment of the present disclosure.

Fig. 2 is a schematic diagram illustrating a structure of a text recognizer according to an exemplary embodiment of the present disclosure.

Fig. 3 is an overall frame diagram illustrating a system table character recognition method according to an exemplary embodiment of the present disclosure.

Fig. 4 is a schematic diagram illustrating a structure of a character classifier according to an exemplary embodiment of the present disclosure.

Fig. 5 is a block diagram illustrating a system table character recognition apparatus according to an exemplary embodiment of the present disclosure.

Fig. 6 is a block diagram illustrating an electronic device 600 according to an example embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

Optical Character recognition (ocr) refers to a process of inspecting characters printed on paper by an electronic device (such as a scanner or a digital camera), determining the shape of the characters by detecting dark and light patterns, and then translating the shape into computer characters by a Character recognition method, that is, for the characters in a printed form, converting the characters in a paper document into an image file of black and white dot matrix in an optical manner, and converting the characters in the image into a text format by recognition software for further editing and processing by word processing software.

Currently, deep learning and OCR are mostly combined for standard table text recognition, and end-to-end OCR technologies based on deep learning include, but are not limited to, CRNN OCR and attention OCR. The two methods are mainly distinguished by the final output layer (translation layer), namely how to convert the sequence feature information learned by the network into the final recognition result. The CRNN OCR and the attention OCR both adopt a network structure of a convolutional neural network CNN + a cyclic neural network RNN in a feature learning stage, the mode adopted by the CRNN OCR during alignment is a CTC algorithm, and the mode adopted by the attention OCR is an attention mechanism.

For standard table character recognition, the recognition rate of a general OCR technology is difficult to reach 90%, that is to say, the recognition result of important information often needs manual proofreading. For the CRNN OCR, in continuous indefinite-length texts, the effect of distinguishing letters similar to Chinese characters is not good; due to the appearance of special cell contents in the table, such as: the character type is a character consisting of 'chinese' + 'numeral' + 'letter' and the like at random, and this case cannot be classified properly; the cell image in the table cannot be cut out.

In order to solve the problems in the related art, the present disclosure provides a method and an apparatus for text recognition of a formulation table, which determine weights of each character type, combine the weights into a text recognizer, perform text recognition of indefinite length, improve recognition accuracy, avoid the problem of shape error of recognition in CRNN OCR as much as possible, and reduce manual checking cost.

Hereinafter, a system table character recognition method and apparatus according to the present disclosure will be described in detail with reference to fig. 1 to 6.

Fig. 1 is a flowchart illustrating a system table text recognition method according to an exemplary embodiment of the present disclosure. Referring to fig. 1, in step 101, at least one cell image to be recognized may be obtained according to a form image of a system to be recognized, where the cell image to be recognized is a cell image of a form in the form image of the system to be recognized.

According to an exemplary embodiment of the present disclosure, the standard form image to be identified may include, but is not limited to, a standard form image identified by a scanner or the like.

According to the exemplary embodiment of the disclosure, the standard form image to be recognized can be segmented into at least one unit form image to be recognized through a computer vision processing technology, namely, the at least one unit form image to be recognized can form the standard form image to be recognized.

According to an exemplary embodiment of the disclosure, at least one cell image to be recognized may be stored as a queue to be recognized, and then the queue to be recognized may be input into a trained character classifier and a trained text recognizer for standard table character recognition.

In step 102, at least one cell image to be recognized may be input into the trained character classifier to obtain a character classification result of each cell image to be recognized in the at least one cell image to be recognized, and at least one cell image to be recognized may be input into the trained text recognizer to obtain a first character recognition result of each cell image to be recognized.

According to an exemplary embodiment of the present disclosure, the character classifier may be a VGG network structure, and the character classifier may be trained by the following steps: acquiring a first training data set, wherein the first training data set is at least one cell image in at least one standard form image; inputting the first training data set into a character classifier to obtain a character classification result of each unit grid image in at least one unit grid image; determining a loss value of the character classifier according to the character classification result of each cell image and the real character result of each cell image; the character classifier is trained by adjusting parameters of the character classifier based on the loss values.

According to an exemplary embodiment of the present disclosure, the penalty value of the character classifier may be calculated according to a cross entropy penalty function.

According to an exemplary embodiment of the present disclosure, the character classifier may include a plurality of convolution structures and at least one full-connected layer, which are sequentially arranged, each convolution structure including at least one convolution layer and a maximum pooling layer; inputting the first training data set into a character classifier to obtain a character classification result of each cell image in at least one cell image, wherein the character classification result comprises the following steps: inputting the first training data set into a first convolution structure to obtain a first output of each cell image; iteratively performing an operation of inputting the ith output of each cell image into the (i + 1) th convolution structure until i = n-1 from i =1, wherein n is the number of the plurality of convolution structures, and n is greater than 1, to obtain the (i + 1) th output of each cell image; inputting the nth output of each cell image into at least one full-connection layer for full connection to obtain the (n + 1) th output of each cell image; and obtaining a character classification result of each unit cell image according to the (n + 1) th output of each unit cell image through a softmax function, wherein at least one convolution layer of each convolution structure performs a convolution operation, and a maximum pooling layer of each convolution structure performs a maximum pooling operation. The number of full-link layers is equal to the number of full-links.

According to an exemplary embodiment of the present disclosure, if n may have a value of 5, the convolution layer may be a 3 × 3 convolution kernel, and the maximum pooling layer may be 2 × 2 maximum pooling, the first training data set is input to a character classifier to obtain a character classification result for each cell image in the at least one cell image, including: inputting the first training data set into a first convolution structure, and sequentially performing convolution on 2 3 × 3 convolution kernels and maximum pooling on 2 × 2 maximum pooling to obtain a first output of each unit grid image; inputting the first output of each unit grid image into a second convolution structure, and sequentially performing convolution on 2 3 × 3 convolution kernels and maximum pooling on 2 × 2 maximum pooling to obtain a second output of each unit grid image; inputting the second output of each unit grid image into a third convolution structure, and sequentially performing convolution on 3 × 3 convolution kernels and maximum pooling on 2 × 2 maximum pooling to obtain a third output of each unit grid image; inputting the third output of each unit grid image into a fourth convolution structure, and sequentially performing convolution on 3 × 3 convolution kernels and maximum pooling on 2 × 2 maximum pooling to obtain a fourth output of each unit grid image; inputting the fourth output of each unit grid image into a fifth convolution structure, and sequentially performing convolution on 3 × 3 convolution kernels and maximum pooling on 2 × 2 maximum pooling to obtain a fifth output of each unit grid image; inputting the fifth output of each cell image into at least one full-connection layer for full-connection to obtain a sixth output of each cell image; and obtaining a character classification result corresponding to each cell image according to the sixth output of each cell image through a softmax function.

Fig. 2 is a schematic diagram illustrating a structure of a text recognizer according to an exemplary embodiment of the present disclosure. Referring to fig. 2, a convolutional layer, a cyclic layer and a transcription layer may be set in the text recognizer, wherein the convolutional layer extracts pixel features, the cyclic layer extracts timing features, the transcription layer induces connection characteristics between characters, the convolutional layer may be a convolutional neural network CNN, and the cyclic layer may be a bidirectional long-and-short term memory network BiLSTM in RNN. The unit cell image can extract a feature sequence from the convolutional layer, predict the label (true value) distribution of the feature sequence obtained from the convolutional layer in the cyclic layer, align the feature sequence by connecting a time sequence classification (CTC) algorithm in the transcription layer, and convert the label distribution obtained from the cyclic layer into a second character recognition result.

According to an example embodiment of the present disclosure, the text recognizer may include a convolutional neural network and a bidirectional long-and-short memory network, and the text recognizer may be trained by: acquiring a second training data set, wherein the training data set is at least one cell image in at least one standard form image; inputting the second training data set into a convolutional neural network to obtain a characteristic sequence of each unit lattice image in at least one unit lattice image; inputting the characteristic sequence of each cell image into a bidirectional long-time memory network to obtain a second character recognition result of each cell image; determining the conditional probability of the second character recognition result of each cell image according to the CTC algorithm; determining the gradient of a bidirectional long-time memory network according to the conditional probability; and training the text recognizer by adjusting parameters of the convolutional neural network and/or the bidirectional long-time memory network according to the gradient.

According to an exemplary embodiment of the present disclosure, the conditional probability may be represented as the following formula (1):

（1）

wherein,

to be composed of

Input bidirectional long-and-short time memory network output

The conditional probability of (a) of (b),

to be composed of

Input bidirectional long-and-short time memory network

The probability of the path is determined by the probability of the path,

represents all passes

Output after conversion

Of (2) a

，

Is a compression transform.

Returning to fig. 1, in step 103, the weight of each character type of each cell image to be recognized may be determined according to the character classification result of each cell image to be recognized.

According to an exemplary embodiment of the disclosure, a preset weight initial value of each character type of each cell image to be recognized may be obtained first, where the character types at least include a number type, a symbol type, an english type, a chinese character type, and other types; and then, adjusting the preset weight initial value of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized so as to determine the weight of each character type of each cell image to be recognized, wherein the character classification result at least comprises a digital character set, a symbol character set, an English character set, a Chinese character set and other character sets.

According to an exemplary embodiment of the present disclosure, the initial weight value of each preset character type of each cell image to be recognized is adjusted, and the initial weight value may be increased or decreased based on the initial weight value. For example, the preset initial value of the weight of the numeric type, the symbol type, the english type, the kanji type and other types is 0.2. If the character classification result of each cell image to be recognized is a digital character set, adjusting the preset weight initial value of each character type of each cell image to be recognized, and determining that the weights of the digital type, the symbol type, the English type, the Chinese character type and other types of each cell image to be recognized are 1, 0 and 0 respectively.

In step 104, a second character recognition result of each cell image to be recognized can be obtained according to the first character recognition result of each cell image to be recognized and the weight of each character type of each cell image to be recognized.

According to the exemplary embodiment of the disclosure, the first character recognition result of each cell image to be recognized can be adjusted based on the weight of each character type of each cell image to be recognized, so as to obtain the second character recognition result of each cell image to be recognized, wherein for each cell image to be recognized, the character types with higher weight account for more in the second character recognition result, and the character types with lower weight account for less in the second character recognition result.

In step 105, the second character recognition result may be filled into a preset blank cell to obtain a cell character recognition result of each cell image to be recognized.

In step 106, the cell character recognition results can be spliced into a standard form character recognition result.

Fig. 3 is an overall frame diagram illustrating a system table character recognition method according to an exemplary embodiment of the present disclosure. The standard table character recognition method in the exemplary embodiment of the present disclosure is described below by way of an example with reference to fig. 3.

First, at least one cell image to be recognized (cell images 1 to n to be recognized) can be obtained according to the form image of the system to be recognized.

Then, at least one unit cell image to be recognized can be input into the trained character classifier, and a character classification result of each unit cell image to be recognized in the at least one unit cell image to be recognized is obtained. The character classifier may be a VGG16 network structure.

Fig. 4 is a schematic diagram illustrating a structure of a character classifier according to an exemplary embodiment of the present disclosure. Referring to fig. 4, there are 5 convolution structures and 3 fully-connected layers, the first two convolution structures include 2 3 × 3 convolution kernels and 12 × 2 max pooling, and the third, fourth, and fifth convolution structures include 3 × 3 convolution kernels and 12 × 2 max pooling. The square cells in fig. 4 are, from left to right, cell images to be identified, 3 × 3 convolution kernels, 2 × 2 max pooling, 3 × 3 convolution kernels, 2 × 2 max pooling, 3 × 3 convolution kernels, 2 × 3 convolution kernels, 3 × 3 max pooling, 3 × 3 convolution kernels, 2 × 2 max pooling, full tie layers, and softmax, respectively.

Based on this, inputting at least one to-be-recognized cell image into the trained character classifier to obtain a character classification result of each to-be-recognized cell image in the at least one to-be-recognized cell image, which may be: adjusting at least one cell image to be recognized to (224, 224, 3); inputting the adjusted at least one cell image to be recognized into a first convolution structure, performing convolution on 2 3 × 3 convolution kernels, outputting a feature layer of 64, outputting (224, 224, 64), and performing maximum pooling on 2 × 2 maximum pooling to obtain a first output (112, 112, 64) of each cell image to be recognized; inputting the first output of each cell image to be identified into a second convolution structure, performing convolution on 2 3 × 3 convolution kernels, outputting a feature layer of 128, outputting (112, 112, 128), and performing maximum pooling on 2 × 2 maximum pooling to obtain a second output (56, 56, 128) of each cell image to be identified; inputting the second output of each cell image to be identified into a third convolution structure, performing convolution on 3 × 3 convolution kernels, outputting a feature layer of 256 and outputs (56, 56 and 256), and performing maximum pooling on 2 × 2 maximum pooling to obtain a third output (28, 28 and 256) of each cell image to be identified; inputting the third output of each cell image to be identified into a fourth convolution structure, performing convolution on 3 × 3 convolution kernels, outputting a feature layer of 512, outputting outputs of (28, 28, 512), and performing maximum pooling by 2 × 2 maximum pooling to obtain a fourth output (14, 14, 512) of each cell image to be identified; inputting the fourth output of each cell image to be identified into a fifth convolution structure, performing convolution on 3 × 3 convolution kernels, outputting a feature layer of 512, outputting (14, 14, 512), and performing maximum pooling by 2 × 2 maximum pooling to obtain a fifth output (7, 7, 512) of each cell image to be identified; inputting the fifth output of each cell image to be recognized into a full connection layer (the neuron is 4096), and fully connecting the fifth output to the 1000-dimensional to obtain the sixth output of each cell image to be recognized; and obtaining a character classification result of each cell image to be recognized according to the sixth output of each cell image to be recognized by a softmax function.

And at least one cell image to be recognized can be input into the trained text recognizer to obtain a first character recognition result of each cell image to be recognized. The text recognizer comprises a convolutional layer (CNN), a loop layer (two layers of BilSTM in RNN, the number of units of a hidden layer is 256) and a transcription layer.

Based on this, inputting at least one to-be-recognized cell image into the trained text recognizer to obtain the first character recognition result of each to-be-recognized cell image, which may be: scaling at least one cell image to be recognized to 32 (height) × W (width) × 3 (dimension); inputting the zoomed at least one cell image to be identified into the convolutional layer to obtain a characteristic sequence of 1 x (W/4) x 512; and inputting the feature sequence into the circulation layer to obtain a second character recognition result of each cell image to be recognized in at least one cell image to be recognized (the output dimension is (sequence _ length, batch, total number of character types class _ num)).

Next, the weight of each character type of each cell image to be recognized may be determined according to the character classification result of each cell image to be recognized.

The method comprises the steps of firstly obtaining a weight initial value of each preset character type of each cell image to be recognized, wherein the character types at least comprise a number type, a symbol type, an English type, a Chinese character type and other types; and then, adjusting the preset weight initial value of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized so as to determine the weight of each character type of each cell image to be recognized, wherein the character classification result at least comprises a digital character set, a symbol character set, an English character set, a Chinese character set and other character sets.

Then, a second character recognition result of each cell image to be recognized can be obtained according to the first character recognition result of each cell image to be recognized and the weight of each character type of each cell image to be recognized.

Then, the second character recognition result can be filled into the preset blank cells to obtain the cell character recognition result of each cell image to be recognized.

Finally, the cell character recognition results can be spliced into standard form character recognition results.

Fig. 5 is a block diagram illustrating a system table character recognition apparatus according to an exemplary embodiment of the present disclosure. Referring to fig. 5, the system form character recognition apparatus 500 includes a unit image acquisition unit 501, a prediction unit 502, a weight acquisition unit 503, a weighting unit 504, a filling unit 505, and a concatenation unit 506.

The unit image obtaining unit 501 may obtain at least one to-be-identified cell image according to the to-be-identified system form image, where the to-be-identified cell image is a cell image of a form in the to-be-identified system form image.

The prediction unit 502 may input at least one to-be-recognized cell image into the trained character classifier to obtain a character classification result of each to-be-recognized cell image in the at least one to-be-recognized cell image, and input the at least one to-be-recognized cell image into the trained text recognizer to obtain a first character recognition result of each to-be-recognized cell image.

According to an exemplary embodiment of the present disclosure, the character classifier may be in a VGG network structure, and the training process of the character classifier is as follows: acquiring a first training data set, wherein the first training data set is at least one cell image in at least one standard form image; inputting the first training data set into a character classifier to obtain a character classification result of each unit grid image in at least one unit grid image; determining a loss value of the character classifier according to the character classification result of each cell image and the real character result of each cell image; the character classifier is trained by adjusting parameters of the character classifier based on the loss values.

According to an exemplary embodiment of the present disclosure, the character classifier may include a plurality of convolution structures and at least one full-connected layer, which are sequentially arranged, each convolution structure including at least one convolution layer and a maximum pooling layer; in the training process of the character classifier, inputting the first training data set into the character classifier to obtain a character classification result of each unit cell image in at least one unit cell image, including: inputting the first training data set into a first convolution structure to obtain a first output of each cell image; iteratively performing an operation of inputting the ith output of each cell image into the (i + 1) th convolution structure until i = n-1 from i =1, wherein n is the number of the plurality of convolution structures, and n is greater than 1, to obtain the (i + 1) th output of each cell image; inputting the nth output of each cell image into at least one full-connection layer for full connection to obtain the (n + 1) th output of each cell image; and obtaining a character classification result of each unit cell image according to the (n + 1) th output of each unit cell image through a softmax function, wherein at least one convolution layer of each convolution structure performs a convolution operation, and a maximum pooling layer of each convolution structure performs a maximum pooling operation. The number of full-link layers is equal to the number of full-links.

According to an example embodiment of the present disclosure, the text recognizer may include a convolutional neural network and a bidirectional long-and-short term memory network, and the training process of the text recognizer may be: acquiring a second training data set, wherein the training data set is at least one cell image in at least one standard form image; inputting the second training data set into a convolutional neural network to obtain a characteristic sequence of each unit lattice image in at least one unit lattice image; inputting the characteristic sequence of each cell image into a bidirectional long-time memory network to obtain a second character recognition result of each cell image; determining the conditional probability of the second character recognition result of each cell image according to the CTC algorithm; determining the gradient of a bidirectional long-time memory network according to the conditional probability; and training the text recognizer by adjusting parameters of the convolutional neural network and/or the bidirectional long-time memory network according to the gradient.

According to an exemplary embodiment of the present disclosure, the conditional probability may be represented as the above equation (1).

The weight obtaining unit 503 may determine the weight of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized.

According to an exemplary embodiment of the present disclosure, the weight obtaining unit 503 may first obtain a weight initial value of each preset character type of each cell image to be recognized, where the character types at least include a number type, a symbol type, an english type, a chinese character type, and other types; then, the weight obtaining unit 503 may adjust the preset weight initial value of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized, so as to determine the weight of each character type of each cell image to be recognized, where the character classification result at least includes a numeric character set, a symbolic character set, an english character set, a chinese character set, and other character sets.

The weighting unit 504 may obtain a second text recognition result of each cell image to be recognized according to the first text recognition result of each cell image to be recognized and the weight of each character type of each cell image to be recognized.

According to an exemplary embodiment of the disclosure, the weighting unit 504 may adjust the first character recognition result of each cell image to be recognized based on the magnitude of the weight of each character type of each cell image to be recognized, so as to obtain the second character recognition result of each cell image to be recognized, wherein for each cell image to be recognized, the more weighted character types account for the second character recognition result, and the less weighted character types account for the second character recognition result.

The filling unit 505 may fill the second character recognition result into the preset blank cells to obtain a cell character recognition result of each cell image to be recognized.

The stitching unit 506 may stitch the cell character recognition results into the standard form character recognition results.

Referring to fig. 6, an electronic device 600 includes at least one memory 601 and at least one processor 602, the at least one memory 601 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 602, perform a system table text recognition method in accordance with exemplary embodiments of the present disclosure.

By way of example, the electronic device 600 may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. Here, the electronic device 600 need not be a single electronic device, but can be any arrangement or collection of circuits capable of executing the above-described instructions (or sets of instructions), either individually or in combination. The electronic device 600 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 600, the processor 602 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 602 may execute instructions or code stored in the memory 601, wherein the memory 601 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 601 may be integrated with the processor 602, for example, with RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 601 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 601 and the processor 602 may be operatively coupled or may communicate with each other, e.g., through I/O ports, network connections, etc., such that the processor 602 can read files stored in the memory.

Further, the electronic device 600 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 600 may be connected to each other via a bus and/or a network.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a system table text recognition method according to the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, a computer program product may also be provided, in which instructions are executable by a processor of a computer device to perform a system form text recognition method according to an exemplary embodiment of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for text recognition of a spreadsheet, comprising:

obtaining at least one cell image to be identified according to the form image of the system to be identified, wherein the cell image to be identified is a cell image of a form in the form image of the system to be identified;

inputting the at least one unit cell image to be recognized into a trained character classifier to obtain a character classification result of each unit cell image to be recognized in the at least one unit cell image to be recognized, and inputting the at least one unit cell image to be recognized into a trained text recognizer to obtain a first character recognition result of each unit cell image to be recognized;

determining the weight of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized;

obtaining a second character recognition result of each cell image to be recognized according to the first character recognition result of each cell image to be recognized and the weight of each character type of each cell image to be recognized;

filling the second character recognition result into a preset blank cell to obtain a cell character recognition result of each cell image to be recognized;

splicing the cell character recognition results into standard form character recognition results;

determining the weight of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized, wherein the determining the weight of each character type of each cell image to be recognized comprises the following steps:

acquiring a weight initial value of each preset character type of each cell image to be recognized, wherein the character types at least comprise a number type, a symbol type, an English type, a Chinese character type and other types;

adjusting the preset weight initial value of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized so as to determine the weight of each character type of each cell image to be recognized, wherein the character classification result at least comprises a numeric character set, a symbolic character set, an English character set, a Chinese character set and other character sets;

the obtaining a second character recognition result of each cell image to be recognized according to the first character recognition result of each cell image to be recognized and the weight of each character type of each cell image to be recognized includes:

and adjusting the first character recognition result of each cell image to be recognized based on the weight of each character type of each cell image to be recognized, so as to obtain a second character recognition result of each cell image to be recognized, wherein for each cell image to be recognized, the character types with higher weight account for more in the second character recognition result, and the character types with lower weight account for less in the second character recognition result.

2. The recognition method of claim 1, wherein the character classifier is a VGG network structure, the character classifier being trained by:

acquiring a first training data set, wherein the first training data set is at least one cell image in at least one standard form image;

inputting the first training data set into the character classifier to obtain a character classification result of each cell image in the at least one cell image;

determining a loss value of the character classifier according to the character classification result of each cell image and the real character result of each cell image;

and training the character classifier by adjusting the parameters of the character classifier according to the loss value.

3. The recognition method of claim 2, wherein the character classifier includes a plurality of convolution structures and at least one full-connected layer arranged in sequence, each convolution structure including at least one convolution layer and a maximum pooling layer;

the inputting the first training data set into the character classifier to obtain a character classification result of each cell image in the at least one cell image includes:

inputting the first training data set into a first convolution structure to obtain a first output of each cell image;

iteratively performing an operation of inputting the ith output of each cell image into the (i + 1) th convolution structures until i = n-1 from i =1, wherein n is the number of the plurality of convolution structures, and n is greater than 1, to obtain the (i + 1) th output of each cell image;

inputting the nth output of each cell image into the at least one full-connection layer for full connection to obtain the (n + 1) th output of each cell image;

obtaining a character classification result of each cell image according to the n +1 output of each cell image through a softmax function,

wherein the at least one convolutional layer of each convolutional structure performs a convolutional operation, and the max pooling layer of each convolutional structure performs a max pooling operation.

4. A recognition method as claimed in claim 1, wherein said text recognizer comprises a convolutional neural network and a bidirectional long-and-short memory network, said text recognizer being trained by:

acquiring a second training data set, wherein the training data set is at least one cell image in at least one standard form image;

inputting the second training data set into the convolutional neural network to obtain a feature sequence of each cell image in the at least one cell image;

inputting the characteristic sequence of each cell image into the bidirectional long-time and short-time memory network to obtain a second character recognition result of each cell image;

determining the conditional probability of the second character recognition result of each cell image according to a CTC algorithm;

determining the gradient of the bidirectional long-time and short-time memory network according to the conditional probability;

and training the text recognizer by adjusting the parameters of the convolutional neural network and/or the bidirectional long-time memory network according to the gradient.

5. The recognition method of claim 4, wherein the conditional probability is expressed as:

wherein,

to be composed of

Inputting the bidirectional long-time memory network output

The conditional probability of (a) of (b),

to be composed of

Inputting said bidirectional long-and-short term memory network through

The probability of the path is determined by the probability of the path,

represents all passes

Output after conversion

Of (2) a

，

Is a compression transform.

6. A system form character recognition apparatus, comprising:

a unit image acquisition unit configured to: obtaining at least one cell image to be identified according to the form image of the system to be identified, wherein the cell image to be identified is a cell image of a form in the form image of the system to be identified;

a prediction unit configured to: inputting the at least one unit cell image to be recognized into a trained character classifier to obtain a character classification result of each unit cell image to be recognized in the at least one unit cell image to be recognized, and inputting the at least one unit cell image to be recognized into a trained text recognizer to obtain a first character recognition result of each unit cell image to be recognized;

a weight acquisition unit configured to: determining the weight of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized;

a weighting unit configured to: obtaining a second character recognition result of each cell image to be recognized according to the first character recognition result of each cell image to be recognized and the weight of each character type of each cell image to be recognized;

a filling unit configured to: filling the second character recognition result into a preset blank cell to obtain a cell character recognition result of each cell image to be recognized;

a stitching unit configured to: splicing the cell character recognition results into standard form character recognition results;

wherein the weight acquisition unit is configured to: acquiring a weight initial value of each preset character type of each cell image to be recognized, wherein the character types at least comprise a number type, a symbol type, an English type, a Chinese character type and other types; adjusting the preset weight initial value of each character type of each cell image to be recognized according to the character classification result of each cell image to be recognized so as to determine the weight of each character type of each cell image to be recognized, wherein the character classification result at least comprises a numeric character set, a symbolic character set, an English character set, a Chinese character set and other character sets;

wherein the weighting unit is configured to: and adjusting the first character recognition result of each cell image to be recognized based on the weight of each character type of each cell image to be recognized, so as to obtain a second character recognition result of each cell image to be recognized, wherein for each cell image to be recognized, the character types with higher weight account for more in the second character recognition result, and the character types with lower weight account for less in the second character recognition result.

7. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a system form literal recognition method as recited in any of claims 1-5.

8. A computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the system table text recognition method of any of claims 1 to 5.