CN111667066A

CN111667066A - Network model training and character recognition method and device and electronic equipment

Info

Publication number: CN111667066A
Application number: CN202010330213.6A
Authority: CN
Inventors: 张婕蕾; 万昭祎; 姚聪
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2020-09-15

Abstract

The invention provides a network model training and character recognition method, a device and electronic equipment, which relate to the technical field of artificial intelligence and comprise the steps of obtaining a plurality of models to be trained and target training samples of the plurality of models to be trained; respectively performing character recognition processing on the target training sample through each model to be trained to obtain a plurality of character recognition results, wherein each character recognition result represents the prediction probability that each character to be recognized in the target training sample is each preset character; determining a relative entropy loss value of each model to be trained based on a plurality of character recognition results and label information of a target training sample; the relative entropy loss value is used for representing the difference degree between a plurality of character recognition results; the model parameters of the corresponding model to be trained are adjusted through the relative entropy loss value, so that the technical problem that the recognition precision is poor in the process of character recognition of the existing character recognition model is solved.

Description

Network model training and character recognition method and device and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a network model training and character recognition method, a network model training and character recognition device and electronic equipment.

Background

In the working process, people often need to process characters in the picture, and because the characters in the picture cannot be edited, the characters of the picture need to be recognized firstly. In the prior art, an Optical Character Recognition (OCR) model can be generally used to recognize characters in a picture. However, the accuracy of the characters identified by the model is low, and with the development of artificial intelligence technology, the characters can be identified by adopting a deep learning algorithm at present. In the deep learning field, there are many methods for recognizing characters, for example, the following methods are available: the first is a character decoder attention-decoder based on attention mechanism; the second is a model based on CTC-Loss (connection prompt Classification, character recognition model of the communication temporary storage recognizer); the third is image segmentation network segmentation.

In the use process of the model, it is found that the attention-based character decoder attention-decoder has stronger sequence modeling capability in the language model, namely, the attention-based character decoder attention-decoder has stronger word-background capability, and the image segmentation network segmentation-based segmentation focuses on the processing of image features. However, relying on sequence modeling capability for a model would make it impossible for the model to recognize words that did not appear in the training set. If the image features are emphasized, the recognition accuracy is greatly reduced if the image quality is poor.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for training a network model and character recognition, and an electronic device, so as to solve the technical problem that the recognition accuracy is poor in the process of performing character recognition by using the existing character recognition model.

In a first aspect, an embodiment of the present invention provides a method for training a network model, including: obtaining a plurality of models to be trained and target training samples of the plurality of models to be trained; respectively performing character recognition processing on the target training sample through each model to be trained to obtain a plurality of character recognition results, wherein each character recognition result represents the prediction probability that each character to be recognized in the target training sample is each preset character; determining a relative entropy loss value of each model to be trained based on the plurality of character recognition results and the label information of the target training sample; wherein the relative entropy loss value is used for representing the difference degree between a plurality of character recognition results; and adjusting the model parameters of the corresponding model to be trained according to the relative entropy loss value.

Further, determining a relative entropy loss value of each model to be trained based on the plurality of word recognition results and the label information of the target training sample comprises: determining a character recognition result of a first model to be trained and a character recognition result of a second model to be trained in the plurality of character recognition results, and respectively obtaining a first character recognition result and a second character recognition result, wherein the first model to be trained is a model of the plurality of models to be trained, of which the relative entropy loss value is to be calculated at the current moment, and the second model to be trained is the other model except the first model to be trained; calculating KL divergence between the first character recognition result and the second character recognition result to obtain a target KL divergence value; and determining a relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample.

Further, the number of the second models to be trained is multiple; each second model to be trained corresponds to a second character recognition result; calculating the KL divergence between the first and second word recognition results comprises: calculating KL divergence between the first character recognition result and each second character recognition result to obtain a plurality of target KL divergence values; determining the relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample comprises: and determining a relative entropy loss value of the first model to be trained based on the plurality of target KL divergence values, the first character recognition result and the label information of the target training sample.

Further, the label information is used for representing the actual probability that the character to be recognized in the target training sample is each preset character; determining the relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample comprises: summing the prediction probability of each character to be recognized in the first character recognition result and the actual probability of the corresponding character to be recognized in the label information to obtain a target calculation result corresponding to each character to be recognized; and summing the target calculation result corresponding to each character to be recognized and the target KL divergence value to obtain a relative entropy loss value of the first model to be trained.

Further, calculating the KL divergence between the first and second word recognition results comprises: transforming the first character recognition result to obtain a first logits vector; and transforming the second character recognition result to obtain a second logits vector; calculating a KL divergence between the first and second logits vectors.

Further, the plurality of models to be trained include the following types of models: a neural network model based on an attention mechanism and an image segmentation network model.

In a second aspect, an embodiment of the present invention further provides a method for training a network model, including: obtaining a plurality of models to be trained and a plurality of target training sample sets of the models to be trained; the determination modes of the sample labels in different target training sample groups are different, and the determination modes of the sample labels are associated with the types of the models to be trained; training each model to be trained by using each target training sample group by the method of any one of the first aspect above; testing the trained model to be trained by using a target test set to obtain a plurality of model test results, wherein one target training sample group corresponds to one model test result; the model test result is used for representing the accuracy of character recognition of the model; determining a balance ability index of each model to be trained based on the model test result; the balance ability index is used for measuring the ability of each model to be trained to execute a plurality of operations; and training each model to be trained again based on the balance ability index.

Further, the plurality of models to be trained include the following types of models: a neural network model and an image segmentation network model based on an attention mechanism; the plurality of target training sample sets comprises: a first type of target training sample and a second type of target training sample; determining label information of each kind of target training sample by the following method, specifically comprising: randomly generating the probability that each character to be recognized is each preset character for the first type of target training sample to obtain the label information of the first type of target training sample; and acquiring a target test set for testing the plurality of models to be trained, and determining label information of a second type of target training sample based on the label information of the target test set.

Further, the length of the label information of the first type of target training sample is the same as the length of the label information of the second type of target training sample, and the length of the label information of the second type of target training sample is the same as the length of the label information of the target test set.

Further, determining label information for the second type of target training sample based on the label information for the target test set comprises: determining a test sample in the target test set, wherein the text sequence contained in the test sample is the same as the text sequence contained in the second type of target training sample; determining label information for the second type of target training sample based on the label information for the test sample.

Further, determining label information for the second type of target training pattern based on the label information for the test pattern comprises: determining label information of the test sample as label information of the second type of target training sample; and/or converting the label information of the test sample according to a preset writing format to obtain converted label information, and determining the converted label information as the label information of the second type of target training sample.

Further, the plurality of model test results includes: a first model test result and a second model test result; determining the balance ability index of each model to be trained based on the model test result comprises: calculating a difference between the first model test result and the second model test result, and determining the difference as the balance ability index; training each model to be trained again based on the balance ability index comprises: and if the difference is larger than the preset difference, training each model to be trained again.

In a third aspect, an embodiment of the present invention further provides a text recognition method, including: acquiring an image to be identified; and performing character recognition on the image to be recognized through a target neural network to obtain a character recognition result, wherein the target neural network is a model obtained by training by adopting the method of any one of the first aspect or the second aspect.

In a fourth aspect, an embodiment of the present invention further provides a device for training a network model, including: the device comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for acquiring a plurality of models to be trained and target training samples of the plurality of models to be trained; the recognition processing unit is used for respectively carrying out character recognition processing on the target training sample through each model to be trained to obtain a plurality of character recognition results, wherein each character recognition result represents the prediction probability that each character to be recognized in the target training sample is each preset character; a first determining unit, configured to determine a relative entropy loss value of each model to be trained based on the multiple character recognition results and the label information of the target training sample; wherein the relative entropy loss value is used for representing the difference degree between a plurality of character recognition results; and the adjusting unit is used for adjusting the model parameters of the corresponding model to be trained through the relative entropy loss value.

In a fifth aspect, an embodiment of the present invention further provides a device for training a network model, including: the second acquisition unit is used for acquiring a plurality of models to be trained and a plurality of target training sample sets of the models to be trained; the determination modes of the sample labels in different target training sample groups are different, and the determination modes of the sample labels are associated with the types of the models to be trained; a first training unit, configured to train each model to be trained by using each target training sample group through the method according to any one of the first aspect; the test unit is used for testing the trained model to be trained by utilizing a target test set to obtain a plurality of model test results, wherein one target training sample group corresponds to one model test result; the model test result is used for representing the accuracy of character recognition of the model; a second determining unit, configured to determine a balance ability index of each model to be trained based on the model test result; the balance ability index is used for measuring the ability of each model to be trained to execute a plurality of operations; and the second training unit is used for training each model to be trained again based on the balance ability index.

In a sixth aspect, an embodiment of the present invention further provides a text recognition apparatus, including: a third acquisition unit configured to acquire an image to be recognized; and the character recognition unit is used for performing character recognition on the image to be recognized through a target neural network to obtain a character recognition result, wherein the target neural network is a model obtained by training by adopting the method of any one of the first aspect or the second aspect.

In a seventh aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processing device, and a computer program stored on the memory and executable on the processing device, where the processing device implements the steps of the method in any one of the first aspect or the second aspect when executing the computer program.

In an eighth aspect, the present invention further provides a computer-readable medium having a non-volatile program code executable by a processing device, where the program code causes the processing device to execute the steps of the method in any one of the above first aspects or implement the steps of the method in any one of the above second aspects.

In the embodiment of the invention, firstly, a plurality of models to be trained and target training samples of the models to be trained are obtained, and then, character recognition processing is respectively carried out on the target training samples through each model to be trained to obtain a plurality of character recognition results; and then, determining a relative entropy loss value of each model to be trained based on the plurality of character recognition results and the label information of the target training sample, and finally, adjusting the model parameters of the corresponding model to be trained according to the relative entropy loss value. According to the description, in the application, through calculating the relative entropy loss value of the model to be trained, and through the mode that the model to be trained is trained through the relative entropy loss value, mutual learning among a plurality of models to be trained can be realized, so that any model to be trained has functions or capabilities of other models to be trained, the adaptability of the model to be trained is improved, the character recognition accuracy of the model is improved, and the technical problem that the recognition accuracy is poor when the existing character recognition model is used for character recognition is solved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an electronic device according to an embodiment of the invention;

FIG. 2 is a flow chart of a method of training a network model according to an embodiment of the present invention;

FIG. 3 is a flow chart of another method of training a network model according to an embodiment of the invention;

FIG. 4 is a flow chart of yet another method of training a network model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a target training sample in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of a network model training apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a training apparatus for a network model according to another embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

first, an electronic device 100 for implementing an embodiment of the present invention, which may be used to run a training method of a network model of embodiments of the present invention, is described with reference to fig. 1.

As shown in fig. 1, electronic device 100 includes one or more processing devices 102, one or more memories 104. Optionally, the electronic device 100 may also include an input device 106, an output device 108, and a data acquisition device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are merely exemplary and not limiting, and the electronic device may also have some of the components shown in fig. 1 or other components and structures not shown in fig. 1, as desired.

The Processing device 102 may be implemented in at least one hardware form of a Digital signal Processing Device (DSP), a Field-Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and an asic (application Specific integrated circuit), and the Processing device 102 may be a Central Processing Unit (CPU) or other form of Processing Unit having data Processing capability and/or instruction execution capability, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processing device 102 to implement client functionality (implemented by the processing device) and/or other desired functionality in embodiments of the present invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The data acquisition device 110 is configured to obtain a plurality of models to be trained and target training samples of the plurality of models to be trained, where data acquired by the data acquisition device is trained by the network model training method to obtain trained models.

Example 2:

in accordance with an embodiment of the present invention, there is provided an embodiment of a method for training a network model, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that herein.

Fig. 2 is a flowchart of a method for training a network model according to an embodiment of the present invention, as shown in fig. 2, the method includes the following steps:

step S202, a plurality of models to be trained and target training samples of the plurality of models to be trained are obtained.

In the present application, the capabilities or functions of the plurality of models to be trained may not be identical. For example, the plurality of models to be trained may include a model based on attention-based text decoder attention-decoder, and then the capabilities or functions of the models to be trained focus on the sequence modeling capabilities of the language model at this time. For another example, the training models may include an image segmentation network segmentation, and in this case, the capability or function of the model to be trained focuses on image feature processing.

In the present application, the attention-based character decoder and the image segmentation network segmentation are only described as examples. In addition to the above two models, a model having other capabilities or functions may be selected, and the present application is not limited thereto.

It should be noted that, in the present application, the model structures of the models to be trained may be the same in size or may be different in size. For example, the plurality of models to be trained may include models with larger model sizes and include models with smaller model sizes.

Step S204, respectively performing character recognition processing on the target training sample through each model to be trained to obtain a plurality of character recognition results, wherein each character recognition result represents the prediction probability that each character to be recognized in the target training sample is each preset character.

In the present application, the character to be recognized may be any character capable of being recognized, such as a chinese character, an upper case and a lower case, and the content of the character to be recognized is not specifically limited.

In the method, after each model to be trained performs character recognition on a target training sample, a character recognition result is obtained, and the character recognition result can represent the prediction probability that the character to be recognized corresponding to each region to be recognized in the target training sample is each preset character.

It should be noted that the preset characters may be 26 capital english letters and 26 small english letters, and may also be preset chinese characters, that is, the preset characters are associated with the types of the characters to be recognized, which is not specifically limited in this application.

Step S206, determining a relative entropy loss value of each model to be trained based on the plurality of character recognition results and the label information of the target training sample; wherein the relative entropy loss value is used for representing the difference degree between a plurality of character recognition results.

In the application, the label information is label information preset for the target training sample, and the label information is used for representing the actual probability that each character to be recognized in the target training sample is each preset character. Wherein, the actual probability is used to characterize the actual text sequence contained in the target training sample.

And S208, adjusting model parameters of the corresponding model to be trained according to the relative entropy loss value.

In an optional embodiment, in step S206, determining a relative entropy loss value of each model to be trained based on the plurality of character recognition results and the label information of the target training sample includes the following processes:

step S2061, determining a character recognition result of a first model to be trained and a character recognition result of a second model to be trained in the plurality of character recognition results, and obtaining a first character recognition result and a second character recognition result respectively, where the first model to be trained is a model of the plurality of models to be trained at the current moment for which a relative entropy loss value is to be calculated, and the second model to be trained is another model of the plurality of models to be trained except the first model to be trained.

Specifically, in the present application, a model to be trained among a plurality of models to be trained is taken as an example for explanation. It is assumed that a first model to be trained in the plurality of models to be trained is taken as an example for explanation, where the first model to be trained is any one of the plurality of models to be trained, and the present application does not specifically limit this.

First, in the present application, a character recognition result of a first model to be trained is determined, and then, a character recognition result of a second model to be trained is determined.

Step S2062, calculating KL divergence between the first character recognition result and the second character recognition result to obtain a target KL divergence value.

In the present application, after determining the first character recognition result and the second character recognition result, the KL divergence between the first character recognition result and the second character recognition result may be calculated. In the present application, the calculation formula of KL divergence can be described as:

in the formula, i is 1 to n in sequence, n is the number of preset characters, and p (x)_i) The probability that the character to be recognized in the first character recognition result is the ith character in the preset characters is represented, and q (x)_i) And the probability that the character to be recognized in the second character recognition result is the ith character in the preset characters is represented.

Step S2063, determining a relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample.

In the present application, after the target KL divergence value is determined and obtained in the manner described above, the relative entropy loss value of the first model to be trained may be determined based on the target KL divergence value, the first character recognition result, and the label information of the target training sample.

According to the description, the KL divergence is combined when the relative entropy loss value of the model to be trained is calculated, and the KL divergence can represent the difference degree between two probability distributions, so that the KL divergence is added to the loss function of the model to be trained, mutual learning among a plurality of models to be trained can be assisted, and the learning capacity of each model to be trained is enriched.

It should be noted that, in the present application, the number of the second models to be trained may be multiple; at this time, each second model to be trained corresponds to one second character recognition result.

Based on this, in the present application, calculating the KL divergence between the first and second word recognition results includes the following processes:

and calculating KL divergence between the first character recognition result and each second character recognition result to obtain a plurality of target KL divergence values.

And if the number of the second models to be trained is multiple, each second model to be trained respectively performs character recognition processing on the target training sample to obtain a second character recognition result.

Based on this, in the present application, the KL divergence between the first character recognition result and each second character recognition result can be calculated, where a plurality of target KL divergence values will be obtained.

It should be noted that in the present application, a formula can be adopted

Calculating KL divergence between the first character recognition result and each second character recognition result. It should be appreciated that the greater the number of second models to be trained, the greater the ability of the first model to be trained to learn.

In this application, after obtaining a plurality of target KL divergence values, the relative entropy loss value of the first model to be trained may be determined based on the plurality of target KL divergence values, the first character recognition result, and the label information of the target training sample.

It should be noted that, in the present application, the label information is preset, and is used to represent the actual probability that the character to be recognized in the target training sample is each preset character.

Based on this, determining the relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample includes the following processes:

(1) and summing the predicted probability of each character to be recognized in the first character recognition result and the actual probability of the corresponding character to be recognized in the label information to obtain a target calculation result corresponding to each character to be recognized.

As can be seen from the above description, in the present application, the first character recognition result includes a probability distribution of each character to be recognized, where the probability distribution represents a prediction probability that each character to be recognized is a preset character. The label information also contains the probability distribution of each character to be recognized, wherein the probability distribution represents the actual probability that each character to be recognized is a preset character.

Based on this, in the application, the prediction probability of each character to be recognized and the actual probability of the corresponding character to be recognized can be summed to obtain the target calculation result.

If the preset characters are 26 english letters, the first character recognition result includes the prediction probability that each character to be recognized is 26 english letters. The label information contains the actual probability that each character to be recognized is 26 english letters. If the number of characters to be recognized is n, the first character recognition result and the label information may be a vector with a length of 26 × n. For each element in the length-26 x n vector, a summation calculation is performed, and the summation calculation result is determined as a target calculation result.

(2) And summing the target calculation result corresponding to each character to be recognized and the target KL divergence value to obtain a relative entropy loss value of the first model to be trained.

After the target calculation result is determined in the above-described manner, the target result and the target KL divergence value may be summed to obtain the relative entropy loss value of the first model to be trained.

It should be noted that, for each character to be recognized, a KL divergence value vector is corresponding, and the KL divergence value vector includes m × 1 numerical values, where m is the number of preset characters, and then, for each character to be recognized, the target calculation result and the numerical values in the KL divergence value vector may be summed correspondingly to obtain the relative entropy loss value of the first model to be trained.

In an optional embodiment of the present application, calculating the KL divergence between the first and second word recognition results comprises:

(1) transforming the first character recognition result to obtain a first locations vector; and transforming the second character recognition result to obtain a second logits vector;

(2) calculating a KL divergence between the first and second logits vectors.

Specifically, in the present application, after the target training samples are respectively subjected to the character recognition processing by each model to be trained to obtain a plurality of character recognition results, the first character recognition result may be transformed, where the transformation formula may be described as:

in the equation, p represents each prediction probability in the first character recognition result, or represents each prediction probability in the second character recognition result.

The above process is described below with reference to fig. 3.

As shown in FIG. 3, Basemodel theta₁Being the first model to be trained, Basemodel theta₂Is the second model to be trained. First, Basemodel theta₁And Basemodel theta₂Performing character recognition processing on the target training sample to obtain a first character recognition result p1 and a second character recognition result p2, and then, according to a formula

Respectively transforming the first character recognition result p1 and the second character recognition result p2 to obtain a first logits vector and a second logits vector, and calculating a target KL divergence value KL (p) based on the first logits vector and the second logits vector₁||p₂) And a target KL divergence value KL (p)₂||p₁) Next, combine Basemodel theta₁The first character recognition result p1 and the target KL divergence value KL (p)₁||p₂) Calculating Basemodel theta with label information of target training sample₁In combination with Basemodel theta, and relative entropy loss function loss1₂Second character recognition result p2, target KL divergence value KL (p)₂||p₁) Calculating Basemodel theta with label information of target training sample₂The relative entropy loss function loss 2.

According to the description, in the application, through calculating the relative entropy loss value of the model to be trained, and through the mode that the model to be trained is trained through the relative entropy loss value, mutual learning among a plurality of models to be trained can be realized, so that any model to be trained has functions or capabilities of other models to be trained, the adaptability of the model to be trained is improved, the character recognition accuracy of the model is improved, and the technical problem that the recognition accuracy is poor when the existing character recognition model is used for character recognition is solved.

Example 3:

according to an embodiment of the present invention, an embodiment of a training method of a network model is provided.

Fig. 4 is a flowchart of a method for training a network model according to an embodiment of the present invention, and as shown in fig. 4, the method includes the following steps:

step S402, obtaining a plurality of models to be trained and a plurality of target training sample groups of the models to be trained; the determination modes of the sample labels in different target training sample groups are different, and the determination modes of the sample labels are associated with the types of the models to be trained;

step S404, each model to be trained is trained by using each target training sample group through the method described in any one of the above embodiments 1;

step S406, testing the trained model to be trained by using a target test set to obtain a plurality of model test results, wherein one target training sample group corresponds to one model test result; the model test result is used for representing the accuracy of character recognition of the model;

step S408, determining the balance ability index of each model to be trained based on the model test result; the balance ability index is used for measuring the ability of each model to be trained to execute a plurality of operations;

and step S410, training each model to be trained again based on the balance ability index.

As can be seen from the above description, in the present application, the ability of each model to be trained to perform multiple operations can be determined by calculating the balance ability index of the model to be trained. By adopting the method, the balance capability of the model among a plurality of functions can be enhanced on the basis of improving the character recognition accuracy of the model.

In the present embodiment, it is assumed that the plurality of models to be trained includes the following types of models: a neural network model and an image segmentation network model based on an attention mechanism; the plurality of target training samples includes: a first type of target training sample and a second type of target training sample.

Based on this, the label information of each kind of target training sample can be determined in the following manner, specifically including:

firstly, randomly generating the probability that each character to be recognized is each preset character for the first type of target training sample to obtain the label information of the first type of target training sample.

In an alternative embodiment, assuming that the number of the preset characters is n, the probability that each character to be recognized is a preset character can be randomly generated for the first type of target training sample to be 1/n. For example, if the predetermined character is 26 english letters, the probability may be 1/26.

Then, a target test set used for testing the plurality of models to be trained is obtained, and label information of a second type of target training sample is determined based on the label information of the target test set.

Specifically, in the present application, a target test set is first obtained, where the target test set may be any one or more of the following: ICDAR test set, SVT test set, CUTE test set, IIIT test set.

After the target test set is obtained, a test sample can be determined in the target test set, wherein the character sequence contained in the test sample is the same as the character sequence contained in the second type of target training sample. Then, the label information of the test sample is determined as the label information of the second type of target training sample.

In addition, the label information of the test sample can be transformed according to a preset writing format to obtain transformed label information, and the transformed label information is determined as the label information of the second type of target training sample.

It should be noted that, in the present application, in order to adapt to various case distributions, each type of label information is generated into triplicate labels according to the same rule of full capitalization, first letter capitalization, and full lowercase.

It should be further noted that, in the present application, a tag generation engine may be preset, and then the tag generation step is executed by the tag generation engine.

According to the description, the label information of the first type of target training sample is randomly generated, and after the model to be trained is trained through the target training sample, the model can be tested through the target mapping test set, so that the recognition capability of the model on the image characteristics can be embodied.

The label information of the second type of target training sample is generated based on the label information of the target test set, and after the model to be trained is trained through the target training sample, the model is tested through the target test set, so that the modeling capacity of the model on the language sequence can be embodied.

In an optional embodiment, the length of the label information of the first type of target training sample is the same as the length of the label information of the second type of target training sample, and the length of the label information of the second type of target training sample is the same as the length of the label information of the target test set.

As shown in fig. 5, the label information of the left six target training samples is derived from the label information of the test set, and the label information of the right six target training samples is derived from pure random generation and is consistent with the length distribution of the label information of the left six target training samples.

In an alternative embodiment of the present application, if the plurality of model test results includes: first and second model test results, determining a balance capability indicator for each model to be trained based on the model test results comprises the following processes:

calculating a difference between the first model test result and the second model test result, and determining the difference as the balance ability indicator.

In this application, the first model test result may represent the accuracy of the model after it is tested by the target test set after the model to be trained is trained by one target training sample set. Likewise, the second model test result may represent the accuracy of the model after it has been tested by the target test set after it has been trained by another target training sample set on the model to be trained.

After obtaining the above two accuracy rates (i.e., the first model test result and the second model test result), a difference between the first model test result and the second model test result may be calculated, and if the difference is greater than a preset difference and the first model test result is greater than the second model test result, it indicates that the balance capability of the model to be trained is poor.

That is, in the present application, if the difference is greater than the preset difference, each of the models to be trained is trained again.

By the training method, the balance capability of the model to be trained can be adjusted, and the character recognition precision of the model is improved.

In the method, first, an image to be recognized is obtained, and then, a character recognition result is obtained by performing character recognition on the image to be recognized through a target neural network, where the target neural network is a model determined by using the method described in embodiment 1 or embodiment 2.

In the application, when the target neural network obtained by using the training method described in embodiment 1 or embodiment 2 identifies an image to be identified, the character identification accuracy can be improved, and the technical problem that the existing character identification model is poor in identification accuracy in the character identification process is solved.

Example 4:

the embodiment of the present invention further provides a training device for a network model, where the training device for a network model is mainly used to execute the training method for a network model provided in the foregoing content of the embodiment of the present invention, and the following describes the training device for a network model provided in the embodiment of the present invention in detail.

Fig. 6 is a schematic diagram of a training apparatus for a network model according to an embodiment of the present invention, as shown in fig. 6, the training apparatus for a network model mainly includes a first obtaining unit 10, a recognition processing unit 20, a first determining unit 30 and an adjusting unit 40, where:

a first obtaining unit 10, configured to obtain a plurality of models to be trained and target training samples of the plurality of models to be trained;

the recognition processing unit 20 is configured to perform character recognition processing on the target training sample through each to-be-trained model to obtain a plurality of character recognition results, where each character recognition result represents a prediction probability that each to-be-recognized character in the target training sample is a preset character;

a first determining unit 30, configured to determine a relative entropy loss value of each model to be trained based on the multiple character recognition results and the label information of the target training sample; wherein the relative entropy loss value is used for representing the difference degree between a plurality of character recognition results;

and the adjusting unit 40 is used for adjusting the model parameters of the corresponding model to be trained through the relative entropy loss value.

Optionally, the first determining unit is configured to: determining a character recognition result of a first model to be trained and a character recognition result of a second model to be trained in the plurality of character recognition results, and respectively obtaining a first character recognition result and a second character recognition result, wherein the first model to be trained is a model of the plurality of models to be trained, of which the relative entropy loss value is to be calculated at the current moment, and the second model to be trained is the other model except the first model to be trained; calculating KL divergence between the first character recognition result and the second character recognition result to obtain a target KL divergence value; and determining a relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample.

Optionally, the first determining unit is further configured to: calculating the KL divergence between the first and second word recognition results comprises: calculating KL divergence between the first character recognition result and each second character recognition result to obtain a plurality of target KL divergence values; determining the relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample comprises: and determining a relative entropy loss value of the first model to be trained based on the plurality of target KL divergence values, the first character recognition result and the label information of the target training sample.

Optionally, the first determining unit is further configured to: determining the relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample comprises: summing the prediction probability of each character to be recognized in the first character recognition result and the actual probability of the corresponding character to be recognized in the label information to obtain a target calculation result corresponding to each character to be recognized; and summing the target calculation result corresponding to each character to be recognized and the target KL divergence value to obtain a relative entropy loss value of the first model to be trained.

Optionally, the first determining unit is further configured to: transforming the first character recognition result to obtain a first logits vector; and transforming the second character recognition result to obtain a second logits vector; calculating a KL divergence between the first and second logits vectors.

Optionally, the plurality of models to be trained comprises the following types of models: a neural network model based on an attention mechanism and an image segmentation network model.

Example 5:

the embodiment of the present invention further provides another training apparatus for a network model, where the training apparatus for a network model is mainly used for executing the training method for a network model provided in the foregoing content of the embodiment of the present invention, and the following describes the training apparatus for a network model provided in the embodiment of the present invention in detail.

Fig. 7 is a schematic diagram of a training apparatus for a network model according to an embodiment of the present invention, as shown in fig. 7, the training apparatus for a network model mainly includes a second obtaining unit 50, a first training unit 60, a testing unit 70, a second determining unit 80, and a second training unit 90, where:

a second obtaining unit 50, configured to obtain a plurality of models to be trained and a plurality of target training sample sets of the plurality of models to be trained; the determination modes of the sample labels in different target training sample groups are different, and the determination modes of the sample labels are associated with the types of the models to be trained;

a first training unit 60 for training each model to be trained by the method of any one of the preceding claims 1 to 10 with each target training sample set;

the test unit 70 is configured to test the trained model to be trained by using a target test set to obtain a plurality of model test results, where one target training sample group corresponds to one model test result; the model test result is used for representing the accuracy of character recognition of the model;

a second determining unit 80, configured to determine a balance ability index of each model to be trained based on the model test result; the balance ability index is used for measuring the ability of each model to be trained to execute a plurality of operations;

and a second training unit 90, configured to train each model to be trained again based on the balance ability index.

Optionally, the plurality of models to be trained comprises the following types of models: a neural network model and an image segmentation network model based on an attention mechanism; the plurality of target training sample sets comprises: a first type of target training sample and a second type of target training sample; the apparatus is also configured to: determining label information of each kind of target training sample by the following method, specifically comprising: randomly generating the probability that each character to be recognized is each preset character for the first type of target training sample to obtain the label information of the first type of target training sample; and acquiring a target test set for testing the plurality of models to be trained, and determining label information of a second type of target training sample based on the label information of the target test set.

Optionally, the length of the label information of the first type of target training sample is the same as the length of the label information of the second type of target training sample, and the length of the label information of the second type of target training sample is the same as the length of the label information of the target test set.

Optionally, the apparatus is further configured to: determining a test sample in the target test set, wherein the text sequence contained in the test sample is the same as the text sequence contained in the second type of target training sample; determining label information for the second type of target training sample based on the label information for the test sample.

Optionally, the apparatus is further configured to: determining label information of the test sample as label information of the second type of target training sample; and/or converting the label information of the test sample according to a preset writing format to obtain converted label information, and determining the converted label information as the label information of the second type of target training sample.

Optionally, the plurality of model test results comprise: a first model test result and a second model test result; the second determination unit is configured to: calculating a difference between the first model test result and the second model test result, and determining the difference as the balance ability index; the second training unit is to: and if the difference is larger than the preset difference, training each model to be trained again.

The embodiment of the invention also provides a character recognition device. The character recognition device mainly comprises a third acquisition unit and a character recognition unit, wherein:

a third acquisition unit configured to acquire an image to be recognized;

and the character recognition unit is used for performing character recognition on the image to be recognized through a target neural network to obtain a character recognition result, wherein the target neural network is a model obtained by training by adopting the method in any one of the embodiment 1 or the embodiment 2.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processing device. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for training a network model, comprising:

obtaining a plurality of models to be trained and target training samples of the plurality of models to be trained;

respectively performing character recognition processing on the target training sample through each model to be trained to obtain a plurality of character recognition results, wherein each character recognition result represents the prediction probability that each character to be recognized in the target training sample is each preset character;

determining a relative entropy loss value of each model to be trained based on the plurality of character recognition results and the label information of the target training sample; wherein the relative entropy loss value is used for representing the difference degree between a plurality of character recognition results;

and adjusting the model parameters of the corresponding model to be trained according to the relative entropy loss value.

2. The method of claim 1, wherein determining a relative entropy loss value for each model to be trained based on the plurality of word recognition results and label information for the target training samples comprises:

determining a character recognition result of a first model to be trained and a character recognition result of a second model to be trained in the plurality of character recognition results, and respectively obtaining a first character recognition result and a second character recognition result, wherein the first model to be trained is a model of the plurality of models to be trained, of which the relative entropy loss value is to be calculated at the current moment, and the second model to be trained is the other model except the first model to be trained;

calculating KL divergence between the first character recognition result and the second character recognition result to obtain a target KL divergence value;

and determining a relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample.

3. The method according to claim 2, wherein the number of the second model to be trained is plural; each second model to be trained corresponds to a second character recognition result;

calculating the KL divergence between the first and second word recognition results comprises: calculating KL divergence between the first character recognition result and each second character recognition result to obtain a plurality of target KL divergence values;

determining the relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample comprises: and determining a relative entropy loss value of the first model to be trained based on the plurality of target KL divergence values, the first character recognition result and the label information of the target training sample.

4. The method according to claim 2 or 3, wherein the label information is used for representing the actual probability of each preset character of the characters to be recognized in the target training sample;

determining the relative entropy loss value of the first model to be trained based on the target KL divergence value, the first character recognition result and the label information of the target training sample comprises:

summing the prediction probability of each character to be recognized in the first character recognition result and the actual probability of the corresponding character to be recognized in the label information to obtain a target calculation result corresponding to each character to be recognized;

and summing the target calculation result corresponding to each character to be recognized and the target KL divergence value to obtain a relative entropy loss value of the first model to be trained.

5. The method of claim 2, wherein calculating the KL divergence between the first and second word recognition results comprises:

transforming the first character recognition result to obtain a first logits vector; and transforming the second character recognition result to obtain a second logits vector;

calculating KL divergence between the first and second logits vectors to obtain the target KL divergence value.

6. The method of claim 1, wherein the plurality of models to be trained comprise the following types of models: a neural network model based on an attention mechanism and an image segmentation network model.

7. A method for training a network model, comprising:

obtaining a plurality of models to be trained and a plurality of target training sample sets of the models to be trained; the determination modes of the sample labels in different target training sample groups are different, and the determination modes of the sample labels are associated with the types of the models to be trained;

training each model to be trained separately with each target training sample set by the method of any of the preceding claims 1 to 6;

testing the trained model to be trained by using a target test set to obtain a plurality of model test results, wherein one target training sample group corresponds to one model test result; the model test result is used for representing the accuracy of character recognition of the model;

determining a balance ability index of each model to be trained based on the model test result; the balance ability index is used for measuring the ability of each model to be trained to execute various operations;

and training each model to be trained again based on the balance ability index.

8. The method of claim 7, wherein the plurality of models to be trained comprise the following types of models: a neural network model and an image segmentation network model based on an attention mechanism; the plurality of target training sample sets comprises: a first type of target training sample and a second type of target training sample;

determining label information of each kind of target training sample by the following method, specifically comprising:

randomly generating the probability that each character to be recognized is each preset character for the first type of target training sample to obtain the label information of the first type of target training sample;

and acquiring a target test set for testing the plurality of models to be trained, and determining label information of a second type of target training sample based on the label information of the target test set.

9. The method of claim 8, wherein the length of the label information of the first type of target training sample is the same as the length of the label information of the second type of target training sample, and the length of the label information of the second type of target training sample is the same as the length of the label information of the target test set.

10. The method of claim 8, wherein determining label information for the second type of target training sample based on the label information for the target test set comprises:

determining a test sample in the target test set, wherein the text sequence contained in the test sample is the same as the text sequence contained in the second type of target training sample;

determining label information for the second type of target training sample based on the label information for the test sample.

11. The method of claim 10, wherein determining label information for the second type of target training pattern based on the label information for the test pattern comprises:

determining label information of the test sample as label information of the second type of target training sample;

and/or

And converting the label information of the test sample according to a preset writing format to obtain converted label information, and determining the converted label information as the label information of the second type of target training sample.

12. The method of claim 7, wherein the plurality of model test results comprises: a first model test result and a second model test result;

determining the balance ability index of each model to be trained based on the model test result comprises: calculating a difference between the first model test result and the second model test result, and determining the difference as the balance ability index;

training each model to be trained again based on the balance ability index comprises: and if the difference is larger than the preset difference, training each model to be trained again.

13. A method for recognizing a character, comprising:

acquiring an image to be identified;

and performing character recognition on the image to be recognized through a target neural network to obtain a character recognition result, wherein the target neural network is a model obtained by training by adopting the method of any one of claims 1 to 12.

14. An apparatus for training a network model, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for acquiring a plurality of models to be trained and target training samples of the plurality of models to be trained;

the recognition processing unit is used for respectively carrying out character recognition processing on the target training sample through each model to be trained to obtain a plurality of character recognition results, wherein each character recognition result represents the prediction probability that each character to be recognized in the target training sample is each preset character;

a first determining unit, configured to determine a relative entropy loss value of each model to be trained based on the multiple character recognition results and the label information of the target training sample; wherein the relative entropy loss value is used for representing the difference degree between a plurality of character recognition results;

and the adjusting unit is used for adjusting the model parameters of the corresponding model to be trained through the relative entropy loss value.

15. An apparatus for training a network model, comprising:

the second acquisition unit is used for acquiring a plurality of models to be trained and a plurality of target training sample sets of the models to be trained; the determination modes of the sample labels in different target training sample groups are different, and the determination modes of the sample labels are associated with the types of the models to be trained;

a first training unit for training each model to be trained by the method of any one of the preceding claims 1 to 6 with each target training sample set;

the test unit is used for testing the trained model to be trained by utilizing a target test set to obtain a plurality of model test results, wherein one target training sample group corresponds to one model test result; the model test result is used for representing the accuracy of character recognition of the model;

a second determining unit, configured to determine a balance ability index of each model to be trained based on the model test result; the balance ability index is used for measuring the ability of each model to be trained to execute a plurality of operations;

and the second training unit is used for training each model to be trained again based on the balance ability index.

16. A character recognition apparatus, comprising:

a third acquisition unit configured to acquire an image to be recognized;

a character recognition unit, configured to perform character recognition on the image to be recognized through a target neural network to obtain a character recognition result, where the target neural network is a model obtained by training according to the method of any one of claims 1 to 12.

17. An electronic device comprising a memory, a processing device and a computer program stored on the memory and executable on the processing device, wherein the processing device implements the steps of the method of any of the preceding claims 1 to 6, or implements the steps of the method of any of the preceding claims 7 to 12, or implements the steps of the method of claim 13 when executing the computer program.

18. A computer readable medium having non-volatile program code executable by a processing device, characterized in that the program code causes the processing device to perform the steps of the method of any of the preceding claims 1 to 6, or to carry out the steps of the method of any of the preceding claims 7 to 12, or to carry out the steps of the method of claim 13.