CN111898620A

CN111898620A - Training method of recognition model, character recognition method, device, equipment and medium

Info

Publication number: CN111898620A
Application number: CN202010672462.3A
Authority: CN
Inventors: 冯晓锐
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-07-14
Filing date: 2020-07-14
Publication date: 2020-11-06

Abstract

The application relates to a training method of a recognition model, a character recognition method, a character recognition device, equipment and a medium. The method comprises the following steps: training an initial feature extraction network of the recognition model by adopting a first teacher network according to the sample character images to obtain a feature extraction network; training the initial time sequence network of the recognition model by adopting a second teacher network according to the sample character images to obtain a time sequence network; wherein, the parameters of the characteristic extraction network are fixed and unchanged in the training process of the initial time sequence network; and extracting the network and the time sequence network according to the characteristics to obtain an identification model. By adopting the method, the time consumption in the training process can be reduced, the network structure of the obtained recognition model is simpler, the recognition efficiency of the obtained recognition model is improved, and the performance of the finally obtained recognition model is improved.

Description

Training method of recognition model, character recognition method, device, equipment and medium

Technical Field

The present application relates to the field of character recognition technologies, and in particular, to a training method for a recognition model, a character recognition method, an apparatus, a device, and a medium.

Background

As Optical Character Recognition (OCR) models are increasingly used in a variety of scenarios. Therefore, in order to adapt the OCR recognition model to various application scenarios, the OCR recognition model needs to be optimized multiple times, and the generality of the OCR recognition model is increased.

In the conventional technology, in order to increase the universality of the OCR recognition model, a larger network is usually used, the idea of knowledge distillation is utilized to train and optimize the larger network, a small network with a better effect is obtained, and the obtained small network is used for replacing a large network to complete the optimization of the OCR recognition model.

However, the conventional optimization method for the OCR recognition model has the problem of long time consumption.

Disclosure of Invention

In view of the above, it is desirable to provide a training method of a recognition model, a character recognition method, an apparatus, a device, and a medium capable of shortening an OCR recognition model optimization time in view of the above technical problems.

A training method of a recognition model, the method comprising:

training an initial feature extraction network of the recognition model by adopting a first teacher network according to the sample character images to obtain a feature extraction network;

training the initial time sequence network of the recognition model by adopting a second teacher network according to the sample character images to obtain a time sequence network; wherein the parameters of the feature extraction network are fixed and unchanged in the training process of the initial time sequence network;

and obtaining an identification model according to the feature extraction network and the time sequence network.

In one embodiment, the training an initial feature extraction network of the recognition model by using a first teacher network according to the sample character images to obtain a feature extraction network includes:

inputting the sample character image into a feature extraction network of the first teacher network, obtaining a first feature map of the sample character image through the feature extraction network of the first teacher network, and inputting the first feature map into a time sequence network of the first teacher network to obtain a first identification result; the first recognition result is a recognition result of the first teacher network on characters in the sample character image;

inputting the sample character image into the initial feature extraction network, obtaining a first sample feature map of the sample character image through the initial feature extraction network, and inputting the first sample feature map into the initial time sequence network to obtain a first sample identification result;

obtaining a value of a loss function of the initial feature extraction network according to the first feature map, the first sample feature map and the first sample identification result;

and training the initial feature extraction network according to the value of the loss function of the initial feature extraction network to obtain the feature extraction network.

In one embodiment, the calculation formula of the loss function of the initial feature extraction network is as follows: l is₁＝β*smoothL₁loss + μ Cross Entrol (y, pre), where L₁Representing a loss function of an initial feature extraction network, wherein beta and mu are parameters, y represents a standard recognition result corresponding to the sample character image, pre represents the first sample recognition result, Cross Encopy represents a cross entropy loss function, smoothL₁loss is the loss value obtained from the first profile and the first sample profile.

In one embodiment, the training the initial time series network of the recognition model by using a second teacher network according to the sample character images to obtain a time series network includes:

inputting the sample character image into a feature extraction network of the second teacher network, obtaining a second feature map of the sample character image through the feature extraction network of the second teacher network, inputting the second feature map into a time sequence network of the second teacher network, obtaining a prediction probability value of the second feature map, and obtaining a second recognition result according to the prediction probability value of the second feature map; the second recognition result is a recognition result of the second teacher network on characters in the sample character image;

inputting the sample character image into the feature extraction network, obtaining a second sample feature map of the sample character image through the feature extraction network, inputting the second sample feature map into an initial time sequence network of the recognition model, obtaining a prediction probability value of the second sample feature map, and obtaining a second sample recognition result according to the prediction probability value of the second sample feature map;

obtaining a value of a loss function of the initial time sequence network according to the prediction probability value of the second prediction graph, the prediction probability value of the second sample feature graph, the second sample identification result and the standard identification result corresponding to the sample character image;

and training the initial time sequence network according to the value of the loss function of the initial time sequence network to obtain the time sequence network.

In one embodiment, the loss function of the initial timing network is calculated by the following formula:

in the formula, L₂Represents the loss function of the initial timing network, alpha represents the weight, T represents the temperature, KLdiv represents the KL divergence,

representing a predicted probability value of the second sample feature map,

representing the prediction probability value of said second prediction graph, crossEncopy representing a cross entropy loss function, Q_SRepresents the second sample recognition result, y_tureAnd indicating the standard recognition result corresponding to the sample character image.

In one embodiment, the network structure of the first teacher network includes resnet50 and a bidirectional long-short term memory network; the network structure of the second teacher network comprises a resnet18 and a bidirectional long-short term memory network; the network structure of the recognition model comprises at least one resblock network and a long-short term memory network.

A method of character recognition, the method comprising:

acquiring a character image to be recognized;

inputting the character image to be recognized into a feature extraction network of a preset recognition model, and obtaining the features of the character image to be recognized through the feature extraction network;

inputting the characteristics of the character image to be recognized into a time sequence network of the recognition model, and obtaining the recognition result of the character image to be recognized through the time sequence network;

the identification model is obtained by training an initial characteristic extraction network of the identification model by adopting a first teacher network and training an initial time sequence network of the identification model by adopting a second teacher network; and in the training process of the initial time sequence network, the parameters of the feature extraction network are fixed and unchanged.

A training apparatus for recognizing a model, the apparatus comprising:

the first training module is used for training the initial feature extraction network of the recognition model by adopting a first teacher network according to the sample character images to obtain a feature extraction network;

the second training module is used for training the initial time sequence network of the recognition model by adopting a second teacher network according to the sample character images to obtain a time sequence network; wherein the parameters of the feature extraction network are fixed and unchanged in the training process of the initial time sequence network;

and the first acquisition module is used for acquiring an identification model according to the feature extraction network and the time sequence network.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The training method, the character recognition method, the device, the equipment and the medium of the recognition model adopt the first teacher network to train the initial feature extraction network of the recognition model according to the sample character images to obtain the feature extraction network, fix the parameters of the obtained feature extraction network, training the initial time sequence network of the recognition model by adopting a second teacher network according to the sample character images to obtain a time sequence network, the identification model is obtained according to the obtained feature extraction network and the time sequence network, the initial feature extraction network and the time sequence network of the identification model are trained by the first teacher network and the second teacher network respectively, the training process of the initial feature extraction network and the training process of the time sequence network are carried out separately, and compared with the method that the initial feature extraction network and the time sequence network are trained simultaneously, the time consumption of the training process is reduced; in addition, the network structures of the initial feature extraction network and the initial time sequence network are simple, and after the initial feature extraction network and the initial time sequence network are trained stably, the initial feature extraction network and the initial time sequence network are optimized based on the obtained feature extraction network and the obtained time sequence network, so that a simple network with high accuracy can be obtained, the recognition efficiency of the obtained recognition model is improved, and the performance of the finally obtained recognition model is improved.

Drawings

FIG. 1 is a diagram of an exemplary environment in which a training method for recognition models may be implemented;

FIG. 2 is a schematic flow chart diagram illustrating a training method for recognition models in one embodiment;

FIG. 2a is a schematic diagram of a network for identifying models, according to an embodiment;

FIG. 3 is a schematic flow chart diagram illustrating a training method for recognition models in another embodiment;

FIG. 3a is a schematic diagram of a first teacher network according to one embodiment;

FIG. 4 is a schematic flow chart diagram illustrating a training method for recognition models in another embodiment;

FIG. 4a is a schematic diagram of a second teacher network according to one embodiment;

FIG. 5 is a flow diagram that illustrates a method for character recognition, according to one embodiment;

FIG. 6 is a block diagram showing the structure of a training apparatus for recognizing a model according to an embodiment;

fig. 7 is a block diagram showing the structure of a character recognition apparatus according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The training method for the recognition model provided by the embodiment of the application can be applied to computer equipment shown in fig. 1. The computer device comprises a processor and a memory connected by a system bus, wherein a computer program is stored in the memory, and the steps of the method embodiments described below can be executed when the processor executes the computer program. Optionally, the computer device may further comprise a network interface, a display screen and an input device. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a nonvolatile storage medium storing an operating system and a computer program, and an internal memory. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. Optionally, the computer device may be a server, a personal computer, a personal digital assistant, other terminal devices such as a tablet computer, a mobile phone, and the like, or a cloud or a remote server, and the specific form of the computer device is not limited in the embodiment of the present application.

It should be noted that, in the training method for the recognition model provided in the embodiment of the present application, the execution subject may be a training device for the recognition model, and the training device for the recognition model may be implemented as part or all of a computer device by software, hardware, or a combination of software and hardware. In the following method embodiments, the execution subject is a computer device as an example.

In one embodiment, as shown in fig. 2, a training method for recognition model is provided, which is illustrated by applying the method to the computer device in fig. 1, and includes the following steps:

s201, training the initial feature extraction network of the recognition model by adopting a first teacher network according to the sample character images to obtain a feature extraction network.

The first teacher network is a network with a relatively complex network structure, and the network structure of the feature extraction network is relatively simple compared with the network structure of the first teacher network. It can be understood that the first teacher network can be understood as a teacher network in knowledge distillation, the feature extraction network can be understood as a student network in knowledge distillation, when the student network is simple, the feature extracted by the feature extraction model is a shallow feature, and when the shallow feature is sent into a bidirectional long-short term memory network, the recognition performance of the network model is poor; if a deeper network is used to extract features, a large amount of training data is required, otherwise, overfitting is easy and training time is long. It should be noted that knowledge distillation can transfer knowledge of one network to another network, the two networks can be isomorphic or heterogeneous, the knowledge distillation method is to train a teacher network first, then train the student network by using the output of the teacher network and the real label of the input data of the teacher, the knowledge distillation can be used to convert the network from a large network to a small network, and the small network can keep the performance close to the large network; in addition, knowledge distillation can also transfer learned knowledge of multiple networks into one network, so that the performance of a single network approaches the integrated result.

Specifically, the computer device trains an initial feature extraction network of the recognition model by adopting a first teacher network according to the sample character images to obtain a feature extraction network. Optionally, the computer device may input the sample character image into the first teacher network and the initial feature extraction network, and train the initial feature extraction network of the recognition model according to the output of the first teacher network and the output of the initial feature extraction network, so as to obtain the feature extraction network.

S202, training the initial time sequence network of the recognition model by adopting a second teacher network according to the sample character images to obtain a time sequence network; and in the training process of the initial time sequence network, the parameters of the feature extraction network are fixed and unchanged.

Specifically, after the computer device obtains the feature network, parameters in the feature extraction network are fixed, and an initial time sequence network of the recognition model is trained by adopting a second teacher network according to the sample character images to obtain a time sequence network. The second teacher network is a network with a relatively complex network structure, and the network structure of the time sequence network is relatively simple compared with the network structure of the second teacher network. Optionally, the computer device may input the sample character image into the second teacher network and the initial time sequence network, and train the initial time sequence network according to the output of the second teacher network and the output of the initial time sequence network, so as to obtain the time sequence network.

And S203, extracting a network and a time sequence network according to the characteristics to obtain an identification model.

Specifically, the computer device extracts a network and a time sequence network according to the obtained features to obtain an identification model. It should be noted that the recognition model includes a feature extraction network and a time sequence network, and after the initial feature extraction network is trained to obtain the feature extraction network, the initial time sequence network is trained, and after the initial time sequence network is trained, the recognition model is obtained. Optionally, the network structure of the recognition model includes at least one resblcok network and a long-short term memory network, and exemplarily, as shown in fig. 2a, the network structure of the recognition model may include at least one resblcok network and one long-short term memory network.

In the training method of the recognition model, the computer device trains the initial feature extraction network and the time sequence network of the recognition model respectively by using the first teacher network and the second teacher network, and the training process of the initial feature extraction network and the training process of the time sequence network are carried out separately, so that compared with the method for training the initial feature extraction network and the time sequence network simultaneously, the time consumption of the training process is reduced; in addition, the network structures of the initial feature extraction network and the initial time sequence network are simple, and after the initial feature extraction network and the initial time sequence network are trained stably, the initial feature extraction network and the initial time sequence network are optimized based on the obtained feature extraction network and the obtained time sequence network, so that a simple network with high accuracy can be obtained, the recognition efficiency of the obtained recognition model is improved, and the performance of the finally obtained recognition model is improved.

In the scene that the computer device adopts the first teacher network to train the initial feature extraction network of the recognition model according to the sample character images to obtain the feature extraction network, the computer device trains the initial feature extraction network according to the output of the first teacher network and the output of the initial feature extraction network. In an embodiment, as shown in fig. 3, the step S201 includes:

s301, inputting the sample character image into a feature extraction network of a first teacher network, obtaining a first feature map of the sample character image through the feature extraction network of the first teacher network, and inputting the first feature map into a time sequence network of the first teacher network to obtain a first identification result; the first recognition result is a recognition result of a character in the sample character image.

Specifically, the computer device inputs the sample character image into a feature extraction network of a first teacher network, obtains a first feature map of the sample character image through the feature extraction network of the first teacher network, and inputs the first feature map into a time sequence network of the first teacher network to obtain a first recognition result. And the first recognition result is the recognition result of the first teacher network on the characters in the sample character image. Alternatively, the network structure of the first teacher network includes the resnet50 and a bidirectional long-short term memory network, and illustratively, as shown in fig. 3a, the network structure of the first teacher network may include the resnet50 and a bidirectional long-short term memory network. It should be noted that the network structure of the first teacher network may be adjusted according to actual needs, for example, adding or deleting networks.

S302, inputting the sample character image into an initial feature extraction network, obtaining a first sample feature map of the sample character image through the initial feature extraction network, and inputting the first sample feature map into an initial time sequence network to obtain a first sample identification result.

Specifically, the computer device inputs the sample character image into an initial feature extraction network of the recognition model, obtains a first sample feature map of the sample character image through the initial feature extraction network, and inputs the first sample feature map into an initial time sequence network to obtain a first sample recognition result of the characters in the sample character image. It should be noted that, in this embodiment, a time sequence network is further connected after the initial feature extraction network, optionally, the computer device inputs the sample character image into the initial feature extraction network, extracts the feature map of the sample character image through the initial feature extraction network, and then the time sequence network identifies the characters in the sample character image according to the extracted feature map, so as to obtain a first sample identification result of the sample character image.

And S303, obtaining a loss function value of the initial feature extraction network according to the first feature diagram, the first sample feature diagram and the first sample identification result.

In particular, the meterAnd the computer equipment obtains the value of the loss function of the initial feature extraction network according to the obtained first feature map, the first sample feature map and the first sample identification result. Optionally, the calculation formula of the loss function of the initial feature extraction network may be: l is₁＝β*smoothL₁loss + μ Cross Entrol (y, pre), where L₁Representing a loss function of an initial feature extraction network, wherein beta and mu are parameters, y represents a standard identification result corresponding to a sample character image, pre represents a first sample identification result, Cross Encopy represents a cross entropy loss function, smoothL₁loss is the loss value obtained from the first profile and the first sample profile. It should be noted that smoothL in the calculation formula of the loss function of the initial feature extraction network₁loss, also smoothL can be used₂loss, wherein smoothL₁loss，smoothL₂loss is used to update the parameters of the initial feature extraction network, but smoothL₁The loss effect is better, smoothL₁loss can prevent the problems of too large gradient and unstable training caused by too large difference of the characteristics of the first teacher network and the characteristic extraction network. It should be noted that, in order to shorten the training time of the recognition model and improve the performance of the recognition model, based on the feature output size of each layer of the recognition model, a network layer having a size corresponding to the recognition model can be found from the first teacher network, and based on smoothL₁loss learning identifies parameters of the middle layers of the model, wherein,

in the formula, x represents a characteristic output size of each layer of the recognition model.

S304, training the initial feature extraction network according to the value of the loss function of the initial feature extraction network to obtain the feature extraction network.

Specifically, the computer device trains the initial feature extraction network according to the obtained value of the loss function of the initial feature extraction network, and determines the corresponding initial feature extraction network as the feature extraction network of the recognition model when the value of the loss function of the initial feature extraction network reaches a stable value or a minimum value.

In this embodiment, the computer device inputs the sample character image into a feature extraction network of the first teacher network, obtaining a first feature map of the sample character image through a feature extraction network of a first teacher network, inputting the first feature map into a time sequence network of the first teacher network to obtain a first recognition result, inputting the sample character image into an initial feature extraction network, obtaining a first sample characteristic diagram of the sample character image through an initial characteristic extraction network, inputting the first sample characteristic diagram into an initial time sequence network to obtain a first sample identification result, the value of the loss function of the initial feature extraction network can be accurately obtained from the first feature map, the first sample feature map, and the first sample recognition result, and further the value of the loss function of the network can be extracted from the initial feature, the initial feature extraction network is accurately trained, thereby improving the accuracy of the obtained feature extraction network.

In the scene that the computer device adopts the second teacher network to train the initial time sequence network of the recognition model according to the sample character images to obtain the time sequence network, the computer device trains the initial time sequence network according to the output of the second teacher network and the output of the initial time sequence network. In one embodiment, as shown in fig. 4, the step S202 includes:

s401, inputting the sample character image into a feature extraction network of a second teacher network, obtaining a second feature map of the sample character image through the feature extraction network of the second teacher network, inputting the second feature map into a time sequence network of the second teacher network, obtaining a prediction probability value of the second feature map, and obtaining a second recognition result according to the prediction probability value of the second feature map; the second recognition result is a recognition result of the second teacher network for the characters in the sample character image.

Specifically, the computer device inputs the sample character image into a feature extraction network of a second teacher network, obtains a second feature map of the sample character image through the feature extraction network of the second teacher network, inputs the second feature map into a time sequence network of the second teacher network, obtains a predicted probability value of the second feature map, and obtains a second recognition result according to the predicted probability value of the second feature map. And the second recognition result is the recognition result of the second teacher network on the characters in the sample character image. Alternatively, the network structure of the second teacher network includes the resnet18 and the two-way long-short term memory network, and exemplarily, as shown in fig. 4a, the network structure of the second teacher network may include the resnet18 and two-way long-short term memory networks. It should be noted that the network structure of the second teacher network may be adjusted according to actual requirements, for example, adding or deleting networks.

S402, inputting the sample character image into a feature extraction network, obtaining a second sample feature map of the sample character image through the feature extraction network, inputting the second sample feature map into an initial time sequence network of the recognition model, obtaining a prediction probability value of the second sample feature map, and obtaining a second sample recognition result according to the prediction probability value of the second sample feature map.

Specifically, the computer device inputs the sample character image into the feature extraction network, obtains a second sample feature map of the sample character image through the feature extraction network, inputs the second sample feature map into an initial time sequence network of the recognition model, obtains a prediction probability value of the second sample feature map, and obtains a second sample recognition result of the characters in the sample character image according to the prediction probability value of the second sample feature map. It should be noted that, in this embodiment, a feature extraction network is further connected before the initial timing network, optionally, the computer device inputs the sample character image into the feature extraction network, extracts the feature map of the sample character image through the feature extraction network, and then the initial timing network identifies the characters in the sample character image according to the extracted feature map, so as to obtain a second sample identification result of the sample character image.

And S403, obtaining a value of a loss function of the initial time sequence network according to the prediction probability value of the second feature map, the prediction probability value of the second sample feature map, the second sample identification result and the standard identification result corresponding to the sample character image.

Specifically, calculatingAnd the machine equipment obtains the value of the loss function of the initial time sequence network according to the obtained prediction probability value of the second characteristic diagram, the prediction probability value of the second sample characteristic diagram, the second sample identification result and the standard identification result corresponding to the sample character image. Alternatively, the computer device may calculate the value of the loss function of the initial timing network of the recognition model by the following formula:

representing the predicted probability values of the second sample feature map,

representing the prediction probability value of the second feature map, Cross Encopy representing the cross entropy loss function, Q_SRepresents the second sample recognition result, y_tureIndicating the standard recognition result corresponding to the sample character image.

S404, training the initial time sequence network according to the value of the loss function of the initial time sequence network to obtain the time sequence network.

Specifically, the computer device trains the initial time sequence network according to the obtained value of the loss function of the initial time sequence network, and determines the corresponding initial time sequence network as the time sequence network of the recognition model when the value of the loss function of the initial time sequence network reaches a stable value or a minimum value.

In this embodiment, the computer apparatus inputs the sample character image into the feature extraction network of the second teacher network, obtains a second feature map of the sample character image through the feature extraction network of the second teacher network, inputs the second feature map into the time-series network of the second teacher network, obtains a second recognition result, inputs the sample character image into the feature extraction network, obtains a second sample feature map of the sample character image through the feature extraction network, inputs the second sample feature map into the initial time-series network, obtains a predicted probability value of the second sample feature map, obtains a second sample recognition result according to the predicted probability value of the second sample feature map, and can accurately obtain a value of a loss function of the initial time-series network according to the predicted probability value of the second prediction map, the predicted probability value of the second sample feature map, the second sample recognition result, and a standard recognition result corresponding to the sample character image, and then the initial time sequence network can be accurately trained according to the value of the loss function of the initial time sequence network, so that the accuracy of the obtained time sequence network is improved.

In one embodiment, as shown in fig. 5, a character recognition method is provided, which is described by taking the method as an example applied to the computer device in fig. 1, and comprises the following steps:

s501, acquiring a character image to be recognized.

Specifically, the computer device acquires a character image to be recognized. Optionally, the computer device may obtain a character image to be recognized from a preset database, or may use an image that needs to be subjected to character recognition as the character image to be recognized according to actual requirements.

And S502, inputting the character image to be recognized into a feature extraction network of a preset recognition model, and obtaining the feature of the character image to be recognized through the feature extraction network.

Specifically, the computer device inputs the acquired character image to be recognized into a feature extraction network of a preset recognition model, and performs feature extraction on the character image to be recognized through the feature extraction network to obtain features of the character image to be recognized. Optionally, the network structure of the preset recognition model may include at least one resblock network and a long-short term memory network.

S503, inputting the characteristics of the character image to be recognized into a time sequence network of the recognition model, and obtaining the recognition result of the character image to be recognized through the time sequence network; the identification model is obtained by training an initial characteristic extraction network of the identification model by adopting a first teacher network and training an initial time sequence network of the identification model by adopting a second teacher network; and in the training process of the initial time sequence network, the parameters of the feature extraction network are fixed and unchanged.

Specifically, the computer device inputs the obtained features of the character image to be recognized into a time sequence network of a recognition model, and obtains a recognition result of the character image to be recognized through the time sequence network. The identification model is obtained by training an initial characteristic extraction network of the identification model by adopting a first teacher network and training an initial time sequence network of the identification model by adopting a second teacher network, and parameters of the characteristic extraction network are fixed in the training process of the initial time sequence network. Alternatively, the network structure of the first teacher network may include the resnet50 and the bidirectional long-short term memory network, the network structure of the second teacher network may include the resnet18 and the bidirectional long-short term memory network, and the network structure of the recognition model may include at least one resblock network and the long-short term memory network. It can be understood that the network structure of the recognition model is simpler, the extracted features may be shallow features, and when the shallow features are sent into the long-term and short-term memory network, the recognition performance of the long-term and short-term memory network is poorer; if the deeper network is used for extracting the features, a large amount of training data is needed, otherwise, overfitting is easy to happen, and the training time is long, so that the initial feature extraction network of the recognition model is trained by adopting the first teacher network, the parameters of the feature extraction network are fixed, and the initial time sequence network of the recognition model is trained by adopting the second teacher network, so that the training time of the recognition model is shortened, and meanwhile, the performance of the recognition model is improved.

In this embodiment, the recognition model is obtained by training the initial feature extraction network of the recognition model by using the first teacher network, and the parameters of the fixed feature extraction network are obtained by training the initial timing sequence network of the recognition model by using the second teacher network, and the network structure of the recognition model is simpler than that of the first teacher network and that of the second teacher network.

To facilitate understanding of those skilled in the art, the following detailed description will be made of a character recognition method provided in the present application, which may include:

s601, inputting the sample character image into a feature extraction network of a first teacher network, obtaining a first feature map of the sample character image through the feature extraction network of the first teacher network, and inputting the first feature map into a time sequence network of the first teacher network to obtain a first identification result; the first recognition result is a recognition result of the first teacher network on characters in the sample character image.

S602, inputting the sample character image into an initial feature extraction network, obtaining a first sample feature map of the sample character image through the initial feature extraction network, and inputting the first sample feature map into an initial time sequence network to obtain a first sample identification result.

S603, according to the formula L₁＝β*smoothL₁loss + μ Cross Entrol (y, pre) calculates the value of the loss function of the initial feature extraction network, where L₁Representing a loss function of an initial feature extraction network, wherein beta and mu are parameters, y represents a standard identification result corresponding to a sample character image, pre represents a first sample identification result, Cross Encopy represents a cross entropy loss function, smoothL₁loss is the loss value obtained from the first profile and the first sample profile.

S604, training the initial feature extraction network according to the value of the loss function of the initial feature extraction network to obtain the feature extraction network of the recognition model.

S605, inputting the sample character image into a feature extraction network of a second teacher network, obtaining a second feature map of the sample character image through the feature extraction network of the second teacher network, inputting the second feature map into a time sequence network of the second teacher network, obtaining a prediction probability value of the second feature map, and obtaining a second recognition result according to the prediction probability value of the second feature map; the second recognition result is a recognition result of the second teacher network for the characters in the sample character image.

S606, fixing the parameters of the feature extraction network unchanged, inputting the sample character image into the feature extraction network, obtaining a second sample feature map of the sample character image through the feature extraction network, inputting the second sample feature map into the initial time sequence network of the recognition model, obtaining the prediction probability value of the second sample feature map, and obtaining a second sample recognition result according to the prediction probability value of the second sample feature map.

S607 according to the formula

Calculating the value of the loss function of the initial time series network, wherein L₂Represents the loss function of the initial timing network, alpha represents the weight, T represents the temperature, KLdiv represents the KL divergence,

representing the predicted probability values of the second sample feature map,

S608, training the initial time sequence network according to the loss function value of the initial time sequence network to obtain the time sequence network of the recognition model.

And S609, acquiring a character image to be recognized.

S610, inputting the character image to be recognized into a feature extraction network of a preset recognition model, and obtaining the feature of the character image to be recognized through the feature extraction network.

S611, inputting the characteristics of the character image to be recognized into a time sequence network of the recognition model, and obtaining the recognition result of the character image to be recognized through the time sequence network.

It should be understood that although the various steps in the flow charts of fig. 2-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 6, there is provided a training apparatus for recognizing a model, including: first training module, second training module and first acquisition module, wherein:

and the first training module is used for training the initial feature extraction network of the recognition model by adopting a first teacher network according to the sample character images to obtain the feature extraction network.

The second training module is used for training the initial time sequence network of the recognition model by adopting a second teacher network according to the sample character images to obtain a time sequence network; and in the training process of the initial time sequence network, the parameters of the feature extraction network are fixed and unchanged.

And the first acquisition module is used for extracting the network and the time sequence network according to the characteristics to obtain the identification model.

The training device for identifying models provided by this embodiment may implement the above method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

Optionally, the network structure of the first teacher network includes resnet50 and a bidirectional long-short term memory network; the network structure of the second teacher network includes resnet18 and a bidirectional long-short term memory network; the network structure of the recognition model comprises at least one resblock network and a long-short term memory network.

On the basis of the foregoing embodiment, optionally, the first training module includes: first acquisition unit, second acquisition unit, third acquisition unit and first training unit, wherein:

the first acquisition unit is used for inputting the sample character image into a feature extraction network of a first teacher network, obtaining a first feature map of the sample character image through the feature extraction network of the first teacher network, and inputting the first feature map into a time sequence network of the first teacher network to obtain a first identification result; the first recognition result is a recognition result of characters in the sample character image by the first teacher network;

the second acquisition unit is used for inputting the sample character image into the initial feature extraction network, obtaining a first sample feature map of the sample character image through the initial feature extraction network, and inputting the first sample feature map into the initial time sequence network to obtain a first sample identification result;

a third obtaining unit, configured to obtain a value of a loss function of the initial feature extraction network according to the first feature map, the first sample feature map, and the first sample identification result;

and the first training unit is used for training the initial feature extraction network according to the value of the loss function of the initial feature extraction network to obtain the feature extraction network.

Optionally, the calculation formula of the loss function of the initial feature extraction network is as follows: l is₁＝β*smoothL₁loss + μ Cross Entrol (y, pre), where L₁Representing a loss function of an initial feature extraction network, wherein beta and mu are parameters, y represents a standard identification result corresponding to a sample character image, pre represents a first sample identification result, Cross Encopy represents a cross entropy loss function, smoothL₁loss is the loss value obtained from the first profile and the first sample profile.

On the basis of the foregoing embodiment, optionally, the second training module includes: a fourth acquisition unit, a fifth acquisition unit, a sixth acquisition unit, and a second training unit, wherein:

the fourth obtaining unit is used for inputting the sample character image into a feature extraction network of a second teacher network, obtaining a second feature map of the sample character image through the feature extraction network of the second teacher network, inputting the second feature map into a time sequence network of the second teacher network, obtaining a prediction probability value of the second feature map, and obtaining a second recognition result according to the prediction probability value of the second feature map; the second recognition result is a recognition result of the second teacher network on characters in the sample character image;

the fifth obtaining unit is used for inputting the sample character image into the feature extraction network, obtaining a second sample feature map of the sample character image through the feature extraction network, inputting the second sample feature map into the initial time sequence network of the recognition model, obtaining the prediction probability value of the second sample feature map, and obtaining a second sample recognition result according to the prediction probability value of the second sample feature map;

a sixth obtaining unit, configured to obtain a value of a loss function of the initial time series network according to the prediction probability value of the second feature map, the prediction probability value of the second sample feature map, the second sample identification result, and the standard identification result corresponding to the sample character image;

and the second training unit is used for training the initial time sequence network according to the value of the loss function of the initial time sequence network to obtain the time sequence network.

Optionally, the calculation formula of the loss function of the initial timing network is as follows:

representing the predicted probability values of the second sample feature map,

For the specific definition of the training device for the recognition model, reference may be made to the above definition of the training method for the recognition model, and details are not repeated here. The modules in the training device for identifying the model can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, as shown in fig. 7, there is provided a character recognition apparatus including: second acquisition module, feature extraction module and identification module, wherein:

the second acquisition module is used for acquiring a character image to be recognized;

the character recognition system comprises a characteristic extraction module, a character recognition module and a recognition module, wherein the characteristic extraction module is used for inputting a character image to be recognized into a characteristic extraction network of a preset recognition model and obtaining the characteristics of the character image to be recognized through the characteristic extraction network;

the recognition module is used for inputting the characteristics of the character image to be recognized into a time sequence network of the recognition model and obtaining the recognition result of the character image to be recognized through the time sequence network;

The character recognition apparatus provided in this embodiment may implement the above method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

For the specific definition of the character recognition device, reference may be made to the above definition of the character recognition method, which is not described herein again. The respective modules in the character recognition apparatus described above may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

training the initial time sequence network of the recognition model by adopting a second teacher network according to the sample character images to obtain a time sequence network; wherein, the parameters of the characteristic extraction network are fixed and unchanged in the training process of the initial time sequence network;

and extracting the network and the time sequence network according to the characteristics to obtain an identification model.

The implementation principle and technical effect of the computer device provided by the above embodiment are similar to those of the above method embodiment, and are not described herein again.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

The implementation principle and technical effect of the computer-readable storage medium provided by the above embodiments are similar to those of the above method embodiments, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A training method for recognition models, the method comprising:

and obtaining the identification model according to the feature extraction network and the time sequence network.

2. The method of claim 1, wherein training an initial feature extraction network of the recognition model using a first teacher network based on the sample character images to obtain a feature extraction network comprises:

3. The method of claim 2, wherein the initial feature extraction network loss function is calculated by: l is₁＝β*smoothL₁loss + μ Cross Entrol (y, pre), where L₁Represents the loss function of the initial feature extraction network, beta, muFor parameters, y represents a standard recognition result corresponding to the sample character image, pre represents the first sample recognition result, cross entropy represents a cross entropy loss function, smoothL₁loss is the loss value obtained from the first profile and the first sample profile.

4. The method of claim 1, wherein training an initial time series network of the recognition model using a second teacher network from the sample character images to obtain a time series network comprises:

obtaining a value of a loss function of the initial time sequence network according to the prediction probability value of the second feature map, the prediction probability value of the second sample feature map, the second sample identification result and the standard identification result corresponding to the sample character image;

5. The method of claim 4, wherein the method is performed in a batch processThe calculation formula of the loss function of the initial time sequence network is as follows:

representing a predicted probability value of the second sample feature map,

6. The method of any of claims 1-5, wherein the network structure of the first teacher network includes resnet50 and a two-way long and short term memory network; the network structure of the second teacher network comprises a resnet18 and a bidirectional long-short term memory network; the network structure of the recognition model comprises at least one resblock network and a long-short term memory network.

7. A method of character recognition, the method comprising:

acquiring a character image to be recognized;

8. A training apparatus for recognizing a model, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.