CN112883956A

CN112883956A - Text character recognition method and device and server

Info

Publication number: CN112883956A
Application number: CN202110300713.XA
Authority: CN
Inventors: 陈思念; 杨兴业; 石雪; 张宇鸿
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-06-01

Abstract

The specification provides a text character recognition method, a text character recognition device and a server. Based on the method, before specific implementation, a preset character recognition model at least comprising a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module can be trained and established in advance, and a cross-layer connection is also arranged between the low-layer convolutional network layer and the high-layer convolutional network layer; in specific implementation, after a target image to be processed is obtained, the preset character recognition model can be called to process the target image to obtain a corresponding processing result; and identifying and determining the target text characters contained in the target image according to the processing result. Therefore, the preset character recognition model which supports multi-scale feature extraction and has a good effect can be called, the method is effectively suitable for complex recognition scenes such as overlapped text characters and the like, the text characters in the image can be accurately and efficiently recognized and determined, recognition errors are reduced, and the accuracy of character recognition is improved.

Description

Text character recognition method and device and server

Technical Field

The specification belongs to the technical field of artificial intelligence, and particularly relates to a text character recognition method, a text character recognition device and a server.

Background

In many data processing scenarios, the image data containing text characters is often directly obtained by the system. At this time, the system needs to perform text character recognition (for example, OCR recognition) on the image data to extract text characters included in the image data; and then, carrying out specific data processing according to the extracted text characters.

However, for some complex recognition scenarios, for example, text characters in an image are overlapped for some reasons, so that the characters are difficult to recognize, and it is often difficult to accurately recognize real text characters in the image based on the existing method.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The specification provides a text character recognition method, a text character recognition device and a text character recognition server, which can be applied to complex recognition scenes such as overlapped text characters and the like, and accurately and efficiently recognize and determine text characters in an image.

The present specification provides a text character recognition method, including:

acquiring a target image to be processed; wherein the target image contains target text characters to be recognized;

calling a preset character recognition model to process the target image to obtain a corresponding processing result; wherein the preset character recognition model at least comprises: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module, wherein cross-layer connection is also arranged between the low-layer convolutional network layer and the high-layer convolutional network layer;

and determining the target text characters according to the processing result.

In one embodiment, the lower convolutional network layer comprises: a first convolution layer, a second convolution layer and a third convolution layer; the high-level convolutional network layer includes: a fourth convolutional layer and a fifth convolutional layer; the fire module includes a first fire module and a second fire module.

In one embodiment, the first, second, third, fourth, and fifth convolutional layers are sequentially connected in series; and a first fire module and a second fire module are sequentially connected in series between the fourth convolution layer and the fifth convolution layer.

In one embodiment, a cross-layer connection is provided between the third convolutional layer and the fifth convolutional layer; and/or a cross-layer connection is arranged between the first convolution layer and the fourth convolution layer.

In one embodiment, before acquiring the target image to be processed, the method further comprises:

constructing an initial model; the initial model at least comprises an initial low-layer convolutional network layer, an initial high-layer convolutional network layer and an initial fire module, and cross-layer connection is arranged between the initial low-layer convolutional network layer and the initial high-layer convolutional network layer;

acquiring a sample image; wherein the sample image contains text characters with overlap;

establishing a training set and a testing set according to the sample image; labeling the sample images in the training set to obtain a labeled training set;

and training the initial model by using the marked training set and the test set to obtain a preset character recognition model meeting the requirements.

In one embodiment, acquiring a sample image comprises:

collecting first picture data containing text characters;

performing expansion processing on the first picture data to obtain second picture data;

according to the text characters, the second picture data are segmented to obtain a plurality of third picture data;

and screening out a sample image containing overlapped text characters from the plurality of third picture data.

In one embodiment, the method further comprises:

determining the receptive field range of the characteristics;

and adjusting the size parameters of convolution kernels used by the initial low-layer convolution network layer and the initial high-layer convolution network layer according to the receptive field range of the characteristics.

In one embodiment, after acquiring the sample image, the method further comprises:

calculating the average value and the variance of the sample image according to the sample image;

and carrying out batch standardization processing on the sample images according to the mean value and the variance of the sample images.

In one embodiment, the target image includes at least one of: pictures containing bills, pictures containing certificates and pictures containing contracts.

The present specification also provides a text character recognition method, including:

calling a preset character recognition model to process the target image to obtain a corresponding processing result; wherein the preset character recognition model at least comprises: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module;

and determining the target text characters according to the processing result.

The present specification also provides a method for establishing a preset character recognition model, including:

This specification also provides a text character recognition apparatus including:

the acquisition module is used for acquiring a target image to be processed; wherein the target image contains target text characters to be recognized;

the calling module is used for calling a preset character recognition model to process the target image to obtain a corresponding processing result; wherein the preset character recognition model at least comprises: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module, wherein cross-layer connection is also arranged between the low-layer convolutional network layer and the high-layer convolutional network layer;

and the determining module is used for determining the target text characters according to the processing result.

The present specification also provides a server comprising a processor and a memory for storing processor-executable instructions, the processor implementing the steps associated with the method for recognizing text characters when executing the instructions.

The present specification also provides a computer readable storage medium having stored thereon computer instructions which, when executed, implement the steps associated with the method of text character recognition.

The specification provides a text character recognition method, a text character recognition device and a text character recognition server, and based on the method, before specific implementation, the text character recognition method can be used for training in advance and establishing a preset character recognition model which at least comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module, and a cross-layer connection is arranged between the low-layer convolutional network layer and the high-layer convolutional network layer; in specific implementation, after a target image to be processed is obtained, the preset character recognition model can be called to process the target image to obtain a corresponding processing result; and identifying and determining the target text characters contained in the target image according to the processing result. Therefore, the preset character recognition model supporting multi-scale feature extraction and having a good effect can be called, the method and the device can be effectively suitable for complex recognition scenes in which characters such as overlapped text characters are difficult to recognize, the text characters in the image can be accurately and efficiently recognized and determined, recognition errors are reduced, and the accuracy of character recognition is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification, the drawings needed to be used in the embodiments will be briefly described below, and the drawings in the following description are only some of the embodiments described in the present specification, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic diagram of an embodiment of a structural component of a system to which a text character recognition method provided in an embodiment of the present specification is applied;

FIG. 2 is a flow chart illustrating a method for recognizing text characters according to an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating an embodiment of a method for recognizing text characters according to an embodiment of the present disclosure;

FIG. 4 is a diagram illustrating an embodiment of a method for recognizing text characters according to an embodiment of the present disclosure;

FIG. 5 is a flow chart illustrating a method for recognizing text characters according to an embodiment of the present disclosure;

fig. 6 is a flowchart illustrating a method for establishing a predetermined character recognition model according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a server according to an embodiment of the present disclosure;

fig. 8 is a schematic structural component diagram of a text character recognition apparatus provided in an embodiment of the present specification;

fig. 9 is a schematic diagram of an embodiment of a method for recognizing text characters, which is provided by an embodiment of the present specification, in a scene example.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

The embodiment of the specification provides a text character recognition method. The text character recognition method can be particularly applied to a system comprising a server and terminal equipment. Specifically, as shown in fig. 1, the terminal device and the server may be connected in a wired or wireless manner to perform specific data interaction.

In this embodiment, the server may specifically include a background server disposed on one side of the service data processing platform and capable of implementing functions such as data transmission and data processing. Specifically, the server may be, for example, an electronic device having data operation, storage function and network interaction function. Alternatively, the server may be a software program running in the electronic device and providing support for data processing, storage and network interaction. In the present embodiment, the number of servers is not particularly limited. The server may specifically be one server, or may also be several servers, or a server cluster formed by several servers.

In this embodiment, the terminal device may specifically include a front-end device that is disposed at a user side and is capable of implementing functions such as image data acquisition and image data transmission. Specifically, the terminal device may be, for example, a monitoring camera, or may also be a desktop computer, a tablet computer, a notebook computer, a smart phone, and the like, which are provided with a camera. Or, the terminal device may also be a software application that can run in the electronic device and supports calling a camera of the electronic device to acquire image data. For example, it may be some APP running on a smartphone, etc.

In specific implementation, the terminal device may collect a photo containing target text characters to be recognized and extracted as a target image to be processed. For example, a business clerk at a certain bank can use a smartphone equipped with a camera as a terminal device to capture a bill provided by a user as a target image.

Then, the terminal device may send the acquired target image to a server in a wired or wireless manner. Correspondingly, the server acquires the target image.

Further, the server may invoke a preset character recognition model trained in advance using sample images containing text characters having overlap to process the target image to output a corresponding processing result.

The preset model structure of the character recognition model is a model structure obtained by pertinently improving a complex recognition scene such as a text character with overlap.

Specifically, the preset character recognition model at least includes a low-level convolutional network layer for extracting low-level features, a high-level convolutional network layer for extracting high-level features, and a deep convolutional network layer in depth and width to obtain more scale and richer fire modules. And a cross-layer connection is further arranged before the low-layer convolutional network layer and the high-layer convolutional network layer, so that two different types of image features, namely the low-layer feature and the high-layer feature, can be fused simultaneously in the following, and text characters in the image can be recognized more accurately.

In specific implementation, the server can process the target image by calling a convolutional network layer in a preset character recognition model to extract features (which can also be called feature vectors, feature maps, feature matrixes and the like) with good effect, high accuracy and more scales; and then, a classifier in a preset character recognition model can be called to obtain a corresponding processing result based on the characteristics.

Furthermore, the server can identify and determine the target characters contained in the target image according to the processing result; and performing specific target data processing according to the identified target text characters.

For example, a server of a data processing system of a certain bank may recognize a target text character on a bill from a target image, further extract key information such as payer information, payee information, and bill amount related to the bill based on the recognized target text character, and implement electronic filing and storage of data for the bill based on the key information.

Through the system, the method and the device can be effectively applied to complex recognition scenes such as overlapped text characters and the like, and can accurately and efficiently recognize and determine the text characters in the image.

Referring to fig. 2, an embodiment of the present disclosure provides a method for recognizing text characters. The method is particularly applied to the server side. In particular implementations, the method may include the following.

S201: acquiring a target image to be processed; wherein the target image contains target text characters to be recognized.

In some embodiments, the target image may be a to-be-processed picture containing target text characters to be recognized. For example, the text may be a photograph containing the target text character to be recognized, or a video shot containing the target text character to be recognized, or the like.

In some embodiments, in particular, in a banking scenario, the target image may be a picture including a bill, and accordingly, text characters (e.g., payee information, payer information, amount of issued ticket, etc.) on the bill may be target text characters to be recognized. In a document verification scenario, the target image may be a picture containing the document, and accordingly, the text characters (e.g., name, document number, native place, etc.) on the document may be the target text characters to be recognized. In the contract data profiling scenario, the target image may also be a picture containing a contract, and accordingly, text characters on the contract (e.g., contract terms, contract signature, etc.) may be the target text characters to be identified. Of course, it should be noted that the above listed target images and target text characters to be recognized are only schematic illustrations. In specific implementation, other types of target images and other types of target text characters can be introduced according to specific application scenes.

Through the embodiment, the text character recognition method provided by the specification can be popularized and applied to various different application scenes so as to process various different target images.

In some embodiments, taking a banking business handling scenario as an example, sometimes it is necessary to collect an image including a bill (or a form) provided by a user, recognize and extract text characters on the bill in the image, handle a corresponding business for the user according to the recognized and extracted text characters, or electronically document the bill and other related data.

However, in the image including the bill, a phenomenon that a character on the bill is overlapped with another character (there is an overlapped text character) may occur due to a relatively thin paper quality of the bill used or a relatively heavy ink of a pen used by a user when writing, and the like, so that the character on the bill is difficult to recognize. Or, due to the influence of factors such as the shooting angle of the image, ambient light, a camera and the like, characters on the bill in the shot image become fuzzy and difficult to recognize.

For the above complex recognition scenes such as the text characters with overlap, which cause difficulty in recognizing the text characters in the image, the existing recognition method based on the text characters is prone to errors and has low recognition accuracy. For example, the existing method may be interfered by overlapped characters when performing text character recognition, so that real text characters cannot be recognized.

In some embodiments, in order to be simultaneously suitable for the complex recognition scenes, accurately and efficiently recognize and determine text characters in the images, and improve recognition accuracy, a preset character recognition model meeting requirements can be trained in advance. How to train and establish the predetermined character recognition model will be described later.

S202: calling a preset character recognition model to process the target image to obtain a corresponding processing result; wherein the preset character recognition model at least comprises: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module, wherein cross-layer connection is further arranged between the low-layer convolutional network layer and the high-layer convolutional network layer.

In some embodiments, which may specifically participate in fig. 3, the used preset character recognition model at least includes the following structures: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module; and a corresponding cross-layer connection can be further arranged between the lower layer convolution network layer and the upper layer convolution network layer.

The low-level convolutional network layer may be specifically configured to extract low-level features, such as low-level features of color and size. The high-level convolutional network layer may be specifically configured to extract, based on the low-level features, corresponding high-level features, such as high-level features like textures and lines, by performing nonlinear transformation.

The FIRE module may specifically refer to a FIRE module proposed in a compressed neural network, and the FIRE module includes two parts, namely a compression layer and an expansion layer. The method can be particularly used for enriching the feature expression capability of the convolutional neural network (for example, a shallow convolutional network) in both depth and width, so as to extract and obtain multi-scale and richer features, and then the text characters can be identified and determined more finely and accurately based on the features.

The cross-layer connection can be specifically used for inputting lower-layer characteristics output by a relatively front low-layer convolutional network layer into a relatively rear high-layer convolutional network layer with the isolation of an intermediate convolutional network layer, so that the relatively rear high-layer convolutional network layer can simultaneously integrate the upper-layer characteristics input by the last convolutional network layer and the lower-layer characteristics, more accurate and comprehensive characteristics are extracted, and the influence of losing part of characteristic information on the identification and extraction of subsequent text characters when the lower-layer characteristics are processed by the intermediate convolutional layer is effectively reduced.

In some embodiments, the lower convolutional network layer may specifically include: a first convolution layer, a second convolution layer and a third convolution trilayer convolution layer; the high-layer convolutional network layer may specifically include: a fourth convolutional layer and a fifth convolutional layer; the fire module comprises a first fire module and a second fire module.

Specifically, as shown in fig. 3, the first convolution layer may be a 5 × 5 convolution layer, which is represented as: conv 1; the second convolutional layer may be specifically a 3 × 3 convolutional layer, and is described as: conv 2; the third convolutional layer may be specifically a 1 × 1 convolutional layer, and is described as: conv 3. The fourth convolutional layer may be specifically a 3 × 3 convolutional layer, and is described as: conv 4; the fifth convolutional layer may be specifically described as: conv 5.

The first fire module (may be referred to as fire1) and the second fire module (may be referred to as fire2) may be two fire modules having the same structure. Wherein each fire module may comprise a compression layer and an expansion layer cascaded together.

Specifically, as shown in fig. 4, the compressed layer may specifically include a 1 × 1 convolutional layer (for example, Kernerl ═ 1 × 1, Num ═ s1, that is, the number of layers is 1 × 1, and the number of convolutional kernels is s1) for compressing the input features. The expansion layer may specifically include two convolution layers connected in parallel, which are respectively: one 1 × 1 convolutional layer (denoted as a first extended convolutional layer, e.g., kernell ═ 1 × 1, Num ═ e1, i.e., the number of layers is 1 × 1, and the number of convolutional kernels is e1) and one 3 × 3 convolutional layer (denoted as a second extended convolutional layer, e.g., kernell ═ 3 × 3, Num ═ e3, i.e., the number of layers is 3 × 3, and the number of convolutional kernels is e 3). Further, a fusion structure for cascade fusion features is connected after the two parallel convolutional layers, and is recorded as: and (6) contact.

Wherein the number of convolution kernels of the compressed convolutional layer may be set to s1, the number of convolution kernels of the first extended convolutional layer may be set to e1, the number of convolution kernels of the second extended convolutional layer may be set to e3, and the following relationship is satisfied: e1 e3 s1, s1 is less than the number of image channels.

In the implementation of the fire module based on the above structure, the input features (e.g., features with size H × W × M) may be compressed by the compressed convolution layer of the compression layer, and then the compressed features (e.g., features with size H × W × s1) may be output. The compressed features are input to a first expansion layer of the expansion layers to output first intermediate features (e.g., features with size H × W × e1), and the compressed features are input to a second expansion layer of the expansion layers to output second intermediate features (e.g., features with size H × W × e 3). And inputting the first intermediate features and the second intermediate features into a fusion structure for splicing and fusion to obtain fused extended features (for example, features with the size of H x W (e1+ e3), (e1+ e3) representing the dimension of the features).

By introducing and using the fire module of the above structure, the processing efficiency relating to the feature extraction can be improved; meanwhile, the convolutional neural network can be deepened in depth and width, the robustness of the network is improved, and the characteristics which are relatively richer and more in scale can be extracted based on the network.

In some embodiments, it is further considered that the size of the target image containing the target text characters to be recognized, which is usually to be processed, is often relatively small, so as to avoid that the preset character recognition model meeting the requirements cannot be trained due to disappearance of the gradient in the training process. Therefore, only two fire modules, namely the first fire module and the second fire module, are selected to be combined in series to serve as a whole fire module to be applied to the preset character recognition model.

In some embodiments, in the preset character recognition model, the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer and the fifth convolutional layer may be specifically sequentially connected in series; and a first fire module and a second fire module are sequentially connected in series between the fourth convolution layer and the fifth convolution layer.

Specifically, as shown in fig. 3, in the preset recognition model, the first convolution layer is connected to the second convolution layer, the second convolution layer is connected to the third convolution layer, the third convolution layer is connected to the fourth convolution layer, the fourth convolution layer is connected to the first fire module of the fire modules, and the first fire module is connected to the second fire module.

In addition, the predetermined character recognition model may further include an Input layer (denoted as Input). The input layer is connected with a first convolution layer, and the input layer is used for accessing a target image (which can be recorded as X) to be processed.

The preset character recognition model may further include one or more full connection layers (for example, the full connection layer 1, the full connection layer 2, and the full connection layer 3 connected in series in sequence), and a Softmax classifier. With the above configuration, the feature output by the fifth convolution layer can be accessed, and the processing result (which may be referred to as Y) for the target image is processed and output based on the feature.

Through the embodiment, the fire module is introduced and connected, the convolutional neural network can be deepened in depth and breadth, the robustness of the network is improved, and therefore the characteristics which are relatively richer and have more scales can be extracted in the following process.

In some embodiments, in order to further improve the recognition accuracy of the preset character recognition model, a cross-layer connection is further provided between the lower convolutional network layer and the higher convolutional network layer.

In some embodiments, a cross-layer connection may be provided between the third convolutional layer and the fifth convolutional layer; and/or a cross-layer connection is arranged between the first convolution layer and the fourth convolution layer.

In some embodiments, specifically, referring to fig. 3, a cross-layer connection is provided between the third convolutional layer (a lower convolutional network layer) and the fifth convolutional layer (a higher convolutional network layer), which may be referred to as a first cross-layer connection.

A plurality of intermediate buildup layers such as a fourth buildup layer, a first fire module and a second fire module are provided between the third buildup layer and the fifth buildup layer.

Through the first cross-layer connection, the characteristics of the input fifth convolutional layer not only include the characteristics of the higher layer from the second fire module, but also include the characteristics of the lower layer from the third convolutional layer. Therefore, the fifth convolutional layer can obtain and fuse the features of the two features, and then extraction processing is performed based on the fused features, so that the information of the features of the lower layer is fully utilized, the influence of loss of the feature information of the features of the lower layer on subsequent character recognition in the extraction processing process of the middle convolutional layer is reduced, the relatively more comprehensive and better-effect features are obtained and output, and a subsequent fully-connected layer and a Softmax classifier can obtain a more accurate processing result based on the feature processing.

In some embodiments, in order to further improve the model accuracy of the preset character recognition model, another cross-layer connection may be further provided between the first convolutional layer and the fourth convolutional layer, which may be referred to as a second cross-layer connection.

With the second cross-layer connection, the features input to the fourth convolutional layer may include both output features from the first convolutional layer and output features from the third convolutional layer. Correspondingly, the fourth convolution layer can acquire and fuse the two features, and then extract and process the features based on the fused features, so that the features which are relatively more comprehensive and better in effect are obtained and output to the connected first fire module.

In some embodiments, in practice, a non-linear function may be further applied to one or more convolutional layers in the preset character recognition model to increase the expression capability of the convolutional neural network.

In some embodiments, in a specific implementation, the corresponding pooling layer may be connected after one or more convolutional layers in the preset character recognition model, so as to perform pooling on the output features of the convolutional layers, and then input the features into the next convolutional layer, thereby reducing subsequent data processing amount.

The pooling treatment mode adopted by the pooling layer can be a maximum pooling treatment mode, and correspondingly, pooling treatment is carried out through the pooling layer, so that the characteristics can be compressed, main characteristics can be extracted, and overfitting can be reduced.

Specifically, after the first convolution layer, the second convolution layer, and the fourth convolution layer, corresponding pooling layers may be connected, respectively, to perform pooling processing on the features output by the three convolution layers.

In some embodiments, further, the features output by one or more convolutional layers in the preset character recognition model may be subjected to a random deactivation process, so as to reduce the interaction between different convolutional layers and reduce the dependency between the features output by different convolutional layers, so that features with relatively better effects may be extracted.

In some embodiments, when implemented, the server may preset the target image input value in a character recognition model, and run the preset character recognition model. Correspondingly, corresponding feature extraction processing can be carried out through a fire module through a specific low-layer convolutional network layer and a specific high-layer convolutional network layer in the preset character recognition model, so that relatively comprehensive features, high precision and good effect can be obtained; inputting the characteristics into a full connection layer and a Softmax classifier for processing; and finally outputting a corresponding processing result.

In some embodiments, in order to obtain a processing result with higher accuracy, in specific implementation, the target image may be preprocessed first to obtain a preprocessed target image with better effect, in which invalid information is removed. And then, calling a preset character recognition model to process the preprocessed target image to obtain a corresponding processing result. The preprocessing may specifically include a noise reduction process, a batch normalization process, a segmentation process, and the like.

In some embodiments, in specific implementation, the server may perform noise reduction processing on the acquired target image to remove noise interference in the target image, so as to obtain a denoised and relatively pure target image; furthermore, the position of the text character in the target image can be positioned according to the first positioning; dividing the target image into a plurality of sub-images which are arranged in sequence according to the positions of the text characters, wherein each sub-image comprises one text character; and processing the character recognition model preset by the input values of the plurality of sub-images arranged in sequence to obtain a more accurate processing result.

S203: and determining the target text characters according to the processing result.

In some embodiments, the processing result may specifically include a pending character with a higher probability identified based on a preset character recognition model, and a score value corresponding to the pending character.

In some embodiments, in implementation, the pending character with the highest scoring value may be screened out as the target text character according to the processing result.

In some embodiments, after determining the target text character according to the processing result, the method may further include: and carrying out specific data processing according to the target text characters.

In particular, for example, in a banking scenario, a ticket may be electronically documented based on the target text characters identified in the ticket. In addition, the fund data in the corresponding payer account and the corresponding payee account can be updated according to the key information such as the payer information, the payee information, the invoice amount and the like in the target text characters.

In some embodiments, before acquiring the target image to be processed, a sample image related to a complex recognition scene, such as a text character containing overlapping text characters, may be acquired, and a preset character recognition model meeting requirements may be established and trained based on the sample image.

In some embodiments, before acquiring the target image to be processed, when the method is implemented, the following may be further included:

s1: constructing an initial model; the initial model at least comprises an initial low-layer convolutional network layer, an initial high-layer convolutional network layer and an initial fire module, and cross-layer connection is arranged between the initial low-layer convolutional network layer and the initial high-layer convolutional network layer;

s2: acquiring a sample image; wherein the sample image contains text characters with overlap;

s3: establishing a training set and a testing set according to the sample image; labeling the sample images in the training set to obtain a labeled training set;

s4: and training the initial model by using the marked training set and the test set to obtain a preset character recognition model meeting the requirements.

Through the embodiment, the preset character recognition model which has higher precision and can support recognition of text characters which are difficult to recognize, such as overlapped text characters, can be trained and established in advance.

In some embodiments, when the initial model is specifically constructed, referring to fig. 3, an initial fire module including an initial first fire module and an initial second fire module connected in series is introduced and set in the initial model. Meanwhile, cross-layer connection is introduced to connect the initial low-layer convolutional network layer and the initial high-layer convolutional network layer, so that an initial model with an improved model structure is obtained. And then, a preset character recognition model meeting the requirements can be obtained through subsequent training based on the initial model with the improved model structure.

In some embodiments, the obtaining of the sample image may include the following steps:

s1: collecting first picture data containing text characters;

s2: performing expansion processing on the first picture data to obtain second picture data;

s3: according to the text characters, the second picture data are segmented to obtain a plurality of third picture data;

s4: and screening out a sample image containing overlapped text characters from the plurality of third picture data.

By the embodiment, the sample images with good effect and meeting the requirements on quantity and quality can be obtained.

In some embodiments, when segmentation is performed specifically, the position of the text character may be located from the second picture data; and then, the second picture data is segmented according to the positions of the characters to obtain a plurality of third picture data. The third picture data thus obtained may be a small image containing only one normal text character, or may be a small image containing only one overlapping abnormal character.

Specifically, for example, by searching, one text string shown below is found in the second image data: "today's weather is sunny". The positions of the text characters in the text character string in the second image can be determined in sequence, the second image data is divided into 5 small images which are arranged in sequence according to the positions of the text characters, wherein the 5 small images respectively comprise a small image containing the current, a small image containing the day, a small image containing the gas and a small image containing the fine, and the 5 small images can be determined to be corresponding third image data.

In some embodiments, considering that the collected effective first image data is often relatively limited, in order to train a preset character recognition model with better effect and higher precision, the first image data may be expanded first. Specifically, the first picture data may be expanded into the second picture data by performing one or more kinds of processing such as panning processing, noise adding processing, noise removing processing, and the like on the first picture data.

In some embodiments, in implementation, each text character in the second picture data may be detected first; and performing corresponding segmentation processing on the second picture data based on the text characters to ensure that each segmented picture data only contains a single text character, so that third picture data meeting requirements can be obtained.

In some embodiments, in specific implementation, for a complex recognition scene with overlapping text characters, third picture data including overlapping text characters may be specifically screened out from the plurality of third picture data, as the sample image.

In some embodiments, after obtaining the sample image, when the method is implemented, the following may be further included: calculating the average value and the variance of the sample image according to the sample image; and carrying out batch standardization processing on the sample images according to the mean value and the variance of the sample images.

By the embodiment, the image data of the sample images can be unified into the same scale range firstly, and then the subsequent model training is carried out, so that the error influence of the image data of different sample images on the subsequent model training due to the difference of factors such as scales can be effectively reduced, and the training precision in the subsequent model training is improved.

In some embodiments, when implemented, the average of the sample image may be calculated according to the following equation:

where β is the average of the sample images, x_iIs the image data value of the sample image numbered i, m is the total number of sample images, and i is the number of sample images.

In some embodiments, when implemented, the variance of the sample image may be calculated according to the following equation:

wherein, γ²Is the variance of the sample image.

In some embodiments, when implemented, the sample image may be normalized according to the following equation:

wherein, ω is_iε is a small positive number used to avoid the divisor 0 being the image data value normalized to the batch of sample images numbered i.

By the embodiment, the image data values of different sample images can be standardized and unified into the normal distribution of (0, 1), so that the influence of overlarge image data difference of the sample images on model training can be reduced; meanwhile, the calculation amount involved in the subsequent model training process can be reduced, the convergence of the model is accelerated, and the training efficiency is improved.

In some embodiments, when implemented, sample images satisfying a preset scale parameter (e.g., 70% or the like) may be randomly extracted from the plurality of sample images as a training set according to the preset scale parameter, and the remaining sample images may be used as a test set.

In some embodiments, when performing the labeling specifically, real characters included in the sample image in the training set and having overlapping text characters may be labeled on the sample image, so as to obtain the labeled training set.

In some embodiments, when the preset character recognition model is specifically trained, the method may further include: determining the receptive field range of the characteristics; and adjusting the size parameters of convolution kernels used by the initial low-layer convolution network layer and the initial high-layer convolution network layer according to the receptive field range of the characteristics.

The above receptive field may be understood as a region where the input image can be seen by the convolutional neural network feature.

In the specific adjustment, in the case that the range of the receptive field of the features of the input convolutional network layer is determined to be large, a convolutional kernel with a relatively large size may be used as the convolutional kernel used by the convolutional network layer to preferentially extract the global features. In contrast, in the case where the range of the receptive field determining the features of the input convolutional network layer is small, a convolution kernel having a relatively small size may be used as the convolution kernel used by the convolutional network layer to preferentially extract the local features.

Through the embodiment, the convolution kernel is adjusted in a targeted manner according to the receptive field, so that the preset character recognition model with high precision can be trained relatively more efficiently.

As can be seen from the above, before specific implementation, the text character recognition method provided in the embodiments of the present specification may be trained in advance to establish a preset character recognition model at least including a low-layer convolutional network layer, a high-layer convolutional network layer, and a fire module, and a cross-layer connection is further provided between the low-layer convolutional network layer and the high-layer convolutional network layer; in specific implementation, after a target image to be processed is obtained, the preset character recognition model can be called to process the target image to obtain a corresponding processing result; and identifying and determining the target text characters contained in the target image according to the processing result. Therefore, the preset character recognition model supporting multi-scale feature extraction and having a good effect can be called, the method is suitable for complex recognition scenes such as overlapped text characters and the like, the text characters in the image can be accurately and efficiently recognized and determined, recognition errors are reduced, and the accuracy of character recognition is improved.

Referring to fig. 5, another text character recognition method is further provided in the embodiments of the present disclosure. When the method is implemented, the following contents can be included:

s501: acquiring a target image to be processed; wherein the target image contains target text characters to be recognized;

s502: calling a preset character recognition model to process the target image to obtain a corresponding processing result; wherein the preset character recognition model at least comprises: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module;

s503: and determining the target text characters according to the processing result.

As can be seen from the above, before specific implementation, a text character recognition method provided in an embodiment of the present disclosure may be implemented by pre-training a preset character recognition model at least including a low-level convolutional network layer, a high-level convolutional network layer, and a fire module; in specific implementation, after a target image to be processed is obtained, the preset character recognition model can be called to process the target image to obtain a corresponding processing result; and identifying and determining the target text characters contained in the target image according to the processing result. Therefore, the preset character recognition model can be called, so that the method is suitable for complex recognition scenes such as overlapped text characters and the like, and the text characters in the image can be accurately and efficiently recognized and determined.

Referring to fig. 6, an embodiment of the present disclosure further provides a method for establishing a preset character recognition model. When the method is implemented, the following contents can be included:

s601: constructing an initial model; the initial model at least comprises an initial low-layer convolutional network layer, an initial high-layer convolutional network layer and an initial fire module, and cross-layer connection is arranged between the initial low-layer convolutional network layer and the initial high-layer convolutional network layer;

s602: acquiring a sample image; wherein the sample image contains text characters with overlap;

s603: establishing a training set and a testing set according to the sample image; labeling the sample images in the training set to obtain a labeled training set;

s604: and training the initial model by using the marked training set and the test set to obtain a preset character recognition model meeting the requirements.

Through the embodiment, the preset character recognition model which is suitable for complex recognition scenes such as overlapped text characters and the like, high in recognition accuracy and good in effect can be trained.

The embodiment of the specification further provides a method for establishing the preset character recognition model. When the method is implemented, the following contents can be included: constructing an initial model; the initial model at least comprises an initial low-layer convolutional network layer, an initial high-layer convolutional network layer and an initial fire module; acquiring a sample image; wherein the sample image contains text characters with overlap; establishing a training set and a testing set according to the sample image; labeling the sample images in the training set to obtain a labeled training set; and training the initial model by using the marked training set and the test set to obtain a preset character recognition model meeting the requirements.

Embodiments of the present specification further provide a server, including a processor and a memory for storing processor-executable instructions, where the processor, when implemented, may perform the following steps according to the instructions: acquiring a target image to be processed; wherein the target image contains target text characters to be recognized; calling a preset character recognition model to process the target image to obtain a corresponding processing result; wherein the preset character recognition model at least comprises: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module, wherein cross-layer connection is also arranged between the low-layer convolutional network layer and the high-layer convolutional network layer; and determining the target text characters according to the processing result.

In order to complete the above instructions more accurately, referring to fig. 7, another specific server is provided in the embodiments of the present specification, where the server includes a network communication port 701, a processor 702, and a memory 703, and the above structures are connected by an internal cable, so that the structures may perform specific data interaction.

The network communication port 701 may be specifically configured to acquire a target image to be processed; wherein the target image contains target text characters to be recognized.

The processor 702 may be specifically configured to invoke a preset character recognition model to process the target image, so as to obtain a corresponding processing result; wherein the preset character recognition model at least comprises: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module, wherein cross-layer connection is also arranged between the low-layer convolutional network layer and the high-layer convolutional network layer; and determining the target text characters according to the processing result.

The memory 703 may be specifically configured to store a corresponding instruction program.

In this embodiment, the network communication port 701 may be a virtual port that is bound to different communication protocols, so that different data can be sent or received. For example, the network communication port may be a port responsible for web data communication, a port responsible for FTP data communication, or a port responsible for mail data communication. In addition, the network communication port can also be a communication interface or a communication chip of an entity. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it can also be a Wifi chip; it may also be a bluetooth chip.

In this embodiment, the processor 702 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The description is not intended to be limiting.

In this embodiment, the memory 703 may include multiple layers, and in a digital system, the memory may be any memory as long as it can store binary data; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.

The present specification further provides a computer storage medium based on the text character recognition method, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer program instructions implement: acquiring a target image to be processed; wherein the target image contains target text characters to be recognized; calling a preset character recognition model to process the target image to obtain a corresponding processing result; wherein the preset character recognition model at least comprises: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module, wherein cross-layer connection is also arranged between the low-layer convolutional network layer and the high-layer convolutional network layer; and determining the target text characters according to the processing result.

In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.

In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.

Referring to fig. 8, in a software level, an embodiment of the present specification further provides a text character recognition apparatus, which may specifically include the following structural modules:

the obtaining module 801 may be specifically configured to obtain a target image to be processed; wherein the target image contains target text characters to be recognized;

the calling module 802 may be specifically configured to call a preset character recognition model to process the target image, so as to obtain a corresponding processing result; wherein the preset character recognition model at least comprises: the system comprises a low-layer convolutional network layer, a high-layer convolutional network layer and a fire module, wherein cross-layer connection is also arranged between the low-layer convolutional network layer and the high-layer convolutional network layer;

the determining module 803 may be specifically configured to determine the target text character according to the processing result.

It should be noted that, the units, devices, modules, etc. illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. It is to be understood that, in implementing the present specification, functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules or sub-units, or the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Therefore, the text character recognition device provided by the embodiment of the specification can be well suitable for complex recognition scenes such as overlapped text characters, accurately and efficiently recognize and determine the text characters in the image, reduce recognition errors and improve the accuracy of character recognition.

In a specific scenario example, the text character recognition method provided in the present specification may be applied to recognize and acquire text characters filled in a bill by a user when the bank transacts business. For a specific implementation process, the following can be referred to.

Specifically, referring to fig. 9, a convolutional neural network model (e.g., a preset character recognition model) for supporting recognition of overlapping handwritten characters in bank notes based on a convolutional neural network, which is obtained by training according to the following steps, may be obtained.

Step one, data preparation.

In the step, various bills of the bank can be collected firstly, and shooting with a camera at multiple angles and different illuminations is carried out so as to convert various paper bills into electronic images (for example, first picture data); supplementing by a plurality of image processing methods of translation and noise (obtaining second picture data); then, segmenting characters of the sample image to obtain a plurality of images (for example, third picture data) containing single characters; filtering out images (for example, sample images) containing overlapped characters; and labeling one part of the overlapped character images to be used as a training set, and labeling the other part of the overlapped character images to be used as a test set.

And step two, preprocessing the data.

In the step, it is considered that the image data obtained in the step one may have factors such as inconsistent size or noise, which results in a large difference in image quality, and the quality of the image quality directly affects the accuracy of the subsequent character recognition. Therefore, before the model training, the image data can be preprocessed to eliminate irrelevant information in the image, retain real information in the image and enhance the detectability of the relevant information; and the data is simplified to the maximum extent, so that the reliability of feature extraction, image segmentation, matching and recognition is improved, and a processed training set and a processed test set are obtained.

Specifically, the obtained image data is subjected to batch normalization processing, so that the image is converted into (0, 1) normal distribution, the subsequent calculation amount is reduced, and the convergence of the model network is accelerated.

The specific calculation process is as follows, where m represents the number of image samples, and x_iRepresenting the data value of the ith image sample.

(1) First, the calculated mean β can be calculated according to the following equation:

(2) then, the variance γ can be calculated according to the following equation²：

(3) The image samples may then be batch normalized according to the following equation:

after the image sample is preprocessed in the above way, the preprocessed image sample is used for subsequent model training.

And step three, training a model.

In the step, when the model is specifically constructed, a new convolutional neural network model can be obtained by improving by introducing a middle fire module of the compressed convolutional neural network. Based on the model, the shallow convolutional neural network can be deepened in depth and width at the same time, different convolutional characteristics can be cascaded, the multi-scale characteristics of the image are improved, the characteristic expression capability of the shallow convolutional neural network is enriched, and the recognition rate of the overlapped handwritten characters is improved. Furthermore, when the model is constructed, the concept of cross-layer connection is introduced, and the low-layer features and the high-layer features are fused, so that the features extracted by a low-layer network layer (for example, a low-layer convolutional network layer) can be fully utilized.

In addition, the interaction between neurons is reduced by using random inactivation on the convolutional layer, and the dependency between features is reduced, so that the accuracy of the identification of the overlapped handwritten characters is further improved.

The following describes the process of model training in detail with reference to fig. 3 and 4. Model training can be performed by using a post-processing training set; and testing by using the processed test set to obtain a trained model.

Specifically, referring to fig. 3, the model for supporting bank bill overlap handwritten character recognition based on a convolutional neural network constructed in the present scenario example may include: 1 input layer, 5 convolutional layers, 2 fire modules, 3 full-connection layers, 1 output layer and cross-layer connection.

And the input layer is mainly used for inputting the training set processed in the step two. The data in the training set may specifically be an image of size n × m × H, where H is the number of channels.

In the present scenario example, the convolutional layers are used to perform local computations on the input image by using the corresponding convolutional kernels. The convolution can be specifically understood as a special linear operation based on a matrix, and the calculation formula is as follows:

S(i,j)＝(X*W)(i,j)＝∑_m∑_nx(i+m,j+n)w(m_,n)

where X is the input matrix and W is the convolution kernel matrix.

In general, low-level features (low-level features) such as color, size, and the like can be extracted using convolution calculations by low-level convolution layers (e.g., low-level convolution network layers); the features extracted by the lower layer can be further subjected to nonlinear transformation through a higher layer convolutional layer (for example, a higher layer convolutional network layer), and high-level features (higher-layer features) such as textures are extracted.

In addition, based on the convolutional layer, local connection and weight sharing can be realized. The local connection specifically means that nodes of a certain convolutional layer are only connected with part of neurons of the upper layer and used for extracting local features; the weight sharing specifically refers to deconvolving the feature graph by using the same convolution kernel, and the two characteristics can reduce the parameter number and the operation complexity.

In the present scenario example, it is also considered that the convolutional layers use different convolutional kernel sizes, and the receptive fields of the neurons corresponding to the overlapped handwritten character images are different, that is, the areas of the mapping of the pixel points on the output feature map in the original image are different in size. Generally, the larger the size of the convolution kernel is, the larger the receptive field in the original image corresponding to the overlapped handwritten character is, and the larger the parameter quantity of the network is. Specifically, when the receptive field is small, the neural network extracts the local features of the overlapped handwritten character images; when the receptive field is large, the neural network extracts the global features of the overlapping handwritten character images. Therefore, the convolution kernel used for the convolution layer can be flexibly adjusted according to the receptive field.

In the present scenario example, for the purpose of extracting features of richer overlapped handwritten character images and reducing computation, specifically, 5 × 5, 3 × 3, and 1 × 1 convolution kernels may be used, and the feature map output by the input layer first passes through 4 convolution layers of 5 × 5, 3 × 3, 1 × 1, and 3 × 3 (i.e., the first convolution layer, the second convolution layer, the third convolution layer, and the fourth convolution layer) in sequence, and batch normalization processing is performed on the output of each convolution layer, so that the input of each layer of the convolutional neural network keeps the same distribution.

In addition, a non-linear mapping function ReLU may also be used for each convolutional layer to increase the expressive power of the neural network. In order to reduce the number of parameters, the feature maps of the overlapped handwritten characters output by the 1 st convolutional layer, the 2 nd convolutional layer and the 4 th convolutional layer may be pooled. The pooling layer can compress the feature map in a maximum pooling mode to extract main features and reduce overfitting.

In this scenario example, the fire module may specifically include: a compression layer and an expansion layer. Wherein the compression layer can compress the feature map by a series of 1 × 1 convolutions; the extension layer respectively uses convolution of 1 × 1 and convolution of 3 × 3 to perform convolution operation, and then outputs of the two are cascaded to obtain one output.

As shown in fig. 4. If the feature map input by the fire module is H × W × M, the calculation process when processing by the fire module is as follows: first, the feature map is subjected to compression layer processing with a convolution kernel size of 1 × 1, and the size of the obtained feature map is H × W × s 1. Secondly, the feature map can be input into an extension layer and convolved by a 1 × 1 convolution layer and a 3 × 3 convolution layer respectively; finally, the two results are concatenated, and the size of the finally obtained feature map is H × W × (e1+ e 3). Wherein s1, e1, e3 respectively represent the number of convolution kernels of the corresponding convolution layer, and also can represent the dimension of the corresponding output feature diagram, and satisfy the following relations: e1 e3 s1 s1< M.

Based on the fire module, the traditional convolutional neural network can be deepened in depth and width at the same time, the robustness of the network is increased, the network can extract richer features, and the accuracy of model identification is improved.

In this scenario example, the fully-connected layer may be located at the tail of the convolutional neural network. Each neuron is connected with all neurons of the adjacent layer, and the discriminative key feature information extracted from the convolutional layer or the pooling layer is integrated and then transmitted to a classifier (Softmax) for classification processing.

Specifically, assuming that a vector formed by input nodes is x, a dimension is N, a vector formed by output nodes is y, and a dimension is M, a calculation formula of the full connection layer is expressed as follows:

y＝Wx

wherein, W is weight matrix with dimension of N multiplied by M.

In this scenario example, the output layer may employ a Softmax classifier, which is primarily used for multi-classification.

Specifically, the classifier can calculate an error between a predicted value and a true value by using a log-log loss function, and then update relevant parameters of the network by using a gradient descent method according to the error so as to complete a training process of the network model.

The formula of the log logarithmic loss function may specifically refer to the following formula:

Loss＝-(y_ilogs_i+(1-y_i)log(1-s_i))

wherein, y_iThe real value of the ith category is represented, and the value can only be 0 or 1; s_iRepresenting the probability that the input data belongs to a certain class, i.e. the prediction. s_iThe calculation formula (2) can be specifically referred to the following formula:

wherein z is_iRepresenting the output of the ith neuron.

In this scenario example, the cross-layer connection may specifically mean that a lower-layer network layer crosses over a network layer (e.g., an intermediate network layer) directly connected to the lower-layer network layer, and is directly connected to a higher-layer network layer, so that the low-layer features and the higher-layer features may be fused in the higher-layer network, and feature information extracted by the lower-layer network layer may be fully utilized, so as to improve accuracy of recognition of the overlapped handwritten character.

And step four, outputting the result.

When the method is used specifically, the test set in the step two can be input into the convolutional neural network model which is trained in the step three, and s obtained by calculating a log logarithmic loss function in the Softmax classifier_iAs a result of image recognition.

According to the scene example, the text character recognition method provided by the specification is verified, firstly, the fire module in the compressed convolutional neural network is introduced to deepen the shallow convolutional neural network in depth and width, different scale convolutional features can be cascaded, the multi-scale features of the image are extracted, the feature expression capability of the shallow convolutional neural network is enriched, and the accuracy of recognition of the overlapped handwritten characters is improved. Secondly, by introducing the thought of cross-layer connection, the low-layer features and the high-layer features are fused, so that the features extracted by the low-layer network can be fully utilized, the accuracy of recognition of the overlapped handwritten characters of the bank notes by the convolutional neural network can be improved, the automatic, accurate and efficient recognition and extraction of the text characters of the overlapped handwritten characters can be realized, and the service processing efficiency is improved.

Although the present specification provides method steps as described in the examples or flowcharts, additional or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. The terms first, second, etc. are used to denote names, but not any particular order.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus necessary general hardware platform. With this understanding, the technical solutions in the present specification may be essentially embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments in the present specification.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims

1. A method for recognizing text characters, comprising:

and determining the target text characters according to the processing result.

2. The method of claim 1, wherein the lower convolutional network layer comprises: a first convolution layer, a second convolution layer and a third convolution layer; the high-level convolutional network layer includes: a fourth convolutional layer and a fifth convolutional layer; the fire module includes a first fire module and a second fire module.

3. The method of claim 2, wherein the first, second, third, fourth, and fifth convolutional layers are sequentially connected in series; and a first fire module and a second fire module are sequentially connected in series between the fourth convolution layer and the fifth convolution layer.

4. The method of claim 3, wherein a cross-layer connection is provided between the third convolutional layer and the fifth convolutional layer; and/or a cross-layer connection is arranged between the first convolution layer and the fourth convolution layer.

5. The method of claim 4, wherein prior to acquiring the target image to be processed, the method further comprises:

6. The method of claim 5, wherein acquiring a sample image comprises:

collecting first picture data containing text characters;

7. The method of claim 5, further comprising:

determining the receptive field range of the characteristics;

8. The method of claim 5, wherein after acquiring the sample image, the method further comprises:

9. The method of claim 1, wherein the target image comprises at least one of: pictures containing bills, pictures containing certificates and pictures containing contracts.

10. A method for recognizing text characters, comprising:

and determining the target text characters according to the processing result.

11. A method for establishing a preset character recognition model is characterized by comprising the following steps:

12. An apparatus for recognizing text characters, comprising:

13. A server comprising a processor and a memory for storing processor-executable instructions which, when executed by the processor, implement the steps of the method of any one of claims 1 to 9.

14. A computer-readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 9.