CN111612081A

CN111612081A - Recognition model training method, device, equipment and storage medium

Info

Publication number: CN111612081A
Application number: CN202010453100.5A
Authority: CN
Inventors: 张�杰; 邹雨晗; 徐倩
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2020-09-01
Anticipated expiration: 2040-05-25
Also published as: CN111612081B

Abstract

The invention discloses a training method, a device, equipment and a storage medium for a recognition model, which relate to the field of financial science and technology, and the training method for the recognition model comprises the following steps: acquiring an image to be trained, and constructing a simulation image according to the image to be trained; determining a training data set according to the image to be trained and the simulation image; and based on the training data set, training a generation countermeasure network and a recognition network in a preset neural network model to obtain a recognition model. According to the method and the device, the training data set is obtained by constructing the simulation image through the obtained image, the situation that the recognition accuracy of the recognition model obtained through training is low due to the fact that the sample of the training data set is insufficient is avoided, and the recognition accuracy of the recognition model obtained through training is improved.

Description

Recognition model training method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence of financial technology (Fintech), in particular to a training method, a device, equipment and a storage medium for a recognition model.

Background

With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (Fintech), and the artificial intelligence technology is no exception, but because of the requirements of the financial industry on safety and real-time performance, higher requirements are also put forward on the artificial intelligence technology.

The deep learning based Character Recognition OCR (Optical Character Recognition) method relies on a large amount of annotation data. In practical application, on one hand, real data (such as financial document data) in a specific field is very expensive and rare, and sometimes even faces privacy protection problems, so that it is often difficult to acquire a large amount of real data; on the other hand, the labeling of the data is time-consuming and labor-consuming, has higher cost, and sometimes has the problem of wrong labeling of the data. According to the Zipf law, in a corpus in natural language, the word frequency of a word is in inverse proportion to the rank of the word frequency in a word frequency table, so that even if a large amount of labeled data is possessed, the occupation ratio of rarely-used words is still far from being sufficient. Some uncommon words encountered when the OCR system is actually on-line even do not appear in training set. The sparsity of rarely-used word data to improve the accuracy of the OCR system mainly focuses on 2 aspects: a. rarely-used words which do not appear in the training set cannot be identified; b. for rarely-used words appearing in a training set, due to extreme unbalance of the number of samples, training is insufficient, and the recognition result is inaccurate. In particular, the uncommon word is easily recognized as a common word with a large number of samples, for example, "the analysis" is easily recognized as "having" the word with a large number of samples because the number of samples is very small.

Therefore, the recognition accuracy of the current recognition model is low.

Disclosure of Invention

The invention mainly aims to provide a training method, a training device, equipment and a storage medium for a recognition model, and aims to solve the technical problem that the recognition accuracy of the existing recognition model is low.

In order to achieve the above object, the present invention provides a training method of a recognition model, including the steps of:

acquiring an image to be trained, and constructing a simulation image according to the image to be trained;

determining a training data set according to the image to be trained and the simulation image;

and based on the training data set, training a generation countermeasure network and a recognition network in a preset neural network model to obtain a recognition model.

Optionally, the recognition model is a character recognition model, and the step of constructing a simulation image according to the image to be trained includes:

acquiring label characters corresponding to the image to be trained, and generating a target corpus containing uncommon words according to the label characters;

and acquiring a background image corresponding to the image to be trained, and constructing according to the target corpus and the background image to obtain a simulation image.

Optionally, the step of obtaining the label words corresponding to the image to be trained and generating a target corpus containing uncommon words according to the label words comprises:

obtaining label characters corresponding to the image to be trained, and constructing according to the label characters to obtain an original corpus;

determining uncommon words corresponding to the original corpus and acquiring contexts corresponding to the uncommon words;

and adding the uncommon word and the context corresponding to the uncommon word into the original corpus to obtain a target corpus.

Optionally, after the step of determining the uncommon word corresponding to the original corpus, the method further includes:

and inserting the uncommon word into the label characters to update the label characters in the original corpus so as to obtain a target corpus.

Optionally, the step of obtaining a background image corresponding to the image to be trained, and constructing a simulation image according to the target corpus and the background image includes:

acquiring a background image without corresponding characters in the image to be trained, and acquiring a target character string corresponding to the background image from the target corpus;

determining the simulation font of the tagged characters in the simulation image according to the character font corresponding to the target character string;

and embedding the target character string into the corresponding background image in the form of the simulation font so as to construct a simulation image.

Optionally, the step of embedding the target text string into the corresponding background image in the form of the simulated font to construct a simulated image includes:

embedding the target character string into the background image in the form of the simulation font to obtain an initial image;

and carrying out noise processing on the initial image to construct and obtain a simulation image.

Optionally, the step of obtaining a recognition model through generation of an antagonistic network and training of a recognition network in a preset neural network model based on the training data set includes:

fixing a generated countermeasure network in a preset neural network model, and optimizing a recognition network in the neural network model by adopting a gradient descent algorithm based on the training data set, so that the generated countermeasure network judges the simulation image in the training data set as real data through hidden layer data obtained after the generated countermeasure network, wherein the generated countermeasure network is a branch of the recognition network;

fixing the recognition network, adopting a gradient descent algorithm to enable the generation countermeasure network to judge the simulation images in the training data set as simulation data, and judging the images to be trained as real data to obtain a recognition model through training.

Optionally, the recognizing model is an optical character recognition OCR recognizing model, and after the step of obtaining the recognizing model by training a generation countermeasure network and a recognition network in a preset neural network model based on the training data set, the method further includes:

after receiving an image to be recognized, inputting the image to be recognized into the OCR recognition model;

and determining characters in the image to be recognized according to the output result of the OCR recognition model.

In order to achieve the above object, the present invention also provides a training device for recognizing a model, including:

the acquisition module is used for acquiring an image to be trained;

the construction module is used for constructing a simulation image according to the image to be trained;

the determining module is used for determining a training data set according to the image to be trained and the simulation image;

and the training module is used for training a generated countermeasure network and a recognition network in a preset neural network model to obtain a recognition model based on the training data set.

In addition, in order to achieve the above object, the present invention further provides a training device for a recognition model, which includes a memory, a processor, and a training program for a recognition model stored in the memory and operable on the processor, wherein the training program for a recognition model, when executed by the processor, implements the steps of the training method for a recognition model corresponding to the federal learning server.

Further, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon a training program of a recognition model, which when executed by a processor, implements the steps of the training method of a recognition model as described above.

The method comprises the steps of obtaining an image to be trained, constructing a simulation image according to the image to be trained, determining a training data set according to the image to be trained and the simulation image, and training a generated countermeasure network and a recognition network in a preset neural network model to obtain a recognition model based on the training data set. The training data set is obtained by constructing the simulation image through the acquired image, the situation that the recognition accuracy of the recognition model obtained through training is low due to the fact that the sample of the training data set is insufficient is avoided, and namely the recognition accuracy of the recognition model obtained through training is improved.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a first embodiment of a training method for recognition models according to the present invention;

FIG. 2 is a schematic flow chart of obtaining a recognition model according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a training method for recognition models according to a third embodiment of the present invention;

FIG. 4 is a block diagram of a preferred embodiment of the training apparatus for recognition models of the present invention;

fig. 5 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a training method of a recognition model, and referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of the training method of the recognition model of the invention.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than presented herein.

The training method of the recognition model is applied to a server or a terminal, and the terminal may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palm computer, a Personal Digital Assistant (PDA), and the like, and a fixed terminal such as a Digital TV, a desktop computer, and the like. In the embodiments of the training method of the recognition model, for convenience of description, the implementation subject is omitted to illustrate the embodiments. The training method of the recognition model comprises the following steps:

and step S10, acquiring an image to be trained, and constructing a simulation image according to the image to be trained.

And acquiring an image to be trained, and constructing a simulation image according to the image to be trained. And compared with the simulation image, the image to be trained is a real image. The images to be trained can be pre-stored or can be obtained from other terminal equipment in the process of training the recognition model. In this embodiment, at least one image to be trained exists, the number of images to be trained in this embodiment is not particularly limited, and each image to be trained has at least one corresponding simulation image. The specific type of the image to be trained is determined by the recognition model to be trained, if the recognition model is an OCR recognition model, the image to be trained is a character image containing characters, and the simulation image is also a character image; when the recognition model is a face recognition model, the image to be trained is a face image, and the simulation image is also a face image.

Further, according to the Zipf law, in a corpus in natural language, the word frequency of a word is inversely proportional to the rank of the word frequency in a word frequency table, so that even if a large amount of labeled data is provided, the occupation ratio of rare words is still far insufficient. Some uncommon words encountered when the OCR recognition model was actually online may not even appear in training set. The sparsity of rarely-used word data to improve the accuracy of OCR recognition models mainly focuses on 2 aspects: a. rarely-used words which do not appear in the training set cannot be identified; b. for rarely-used words appearing in a training set, due to extreme unbalance of the number of samples, training is insufficient, and the recognition result is inaccurate. In particular, the uncommon word is easily recognized as a common word with a large number of samples, for example, "the analysis" is easily recognized as "having" the word with a large number of samples because the number of samples is very small. Therefore, the recognition accuracy of the current character recognition model for the uncommon character is low. Therefore, in order to improve the recognition accuracy of the character recognition model for uncommon characters and improve the field adaptability of the character recognition model, even though the character recognition model can be applied to more scenes, the recognition model is a character recognition model, and the step of constructing the simulation image according to the image to be trained comprises:

step a, obtaining the label characters corresponding to the image to be trained, and generating a target corpus containing uncommon words according to the label characters.

Further, if the recognition model is a character recognition model, in the process of constructing the simulation image according to the image to be trained, firstly, obtaining the label characters corresponding to the image to be trained, namely, obtaining the characters in the image to be trained, and generating a target corpus containing the uncommon characters according to the obtained label characters. It should be noted that the label text corresponding to the image to be trained is expressed in the form of a character string, each image to be trained has the corresponding label text, and at least one character exists in the label text. The uncommon word can be a rarely used word existing in the label characters or a rarely used word not existing in the label characters. Specifically, after the label characters in the image to be trained are acquired, the languages of the label characters are determined, wherein the languages include but are not limited to Chinese, English and Japanese. After the language of the label character is determined, character data are crawled in a preset database according to the language, so that the uncommon character corresponding to the label character is found, wherein the preset database can be Wikipedia or other databases containing a large number of characters.

Further, step a includes:

step a1, obtaining the label words corresponding to the image to be trained, and constructing to obtain an original corpus according to the label words.

Step a2, determining the uncommon word corresponding to the original corpus, and acquiring the context corresponding to the uncommon word.

Step a3, adding the uncommon word and the context corresponding to the uncommon word into the original corpus to obtain a target corpus.

Further, label characters corresponding to the image to be trained are obtained, and all the obtained label characters are used as the corpus in the original corpus, so that the original corpus is constructed. Determining the uncommon word corresponding to the original corpus, namely determining the uncommon word corresponding to the label character in the original corpus, and acquiring the context corresponding to the uncommon word. It should be noted that the method for obtaining the context corresponding to the uncommon word is the same as the method for obtaining the uncommon word, and the detailed description thereof is omitted here. In the embodiment, the number of the characters of the uncommon word context is not limited, for example, the context corresponding to the uncommon word is the 5 characters in front of the uncommon word and the 5 characters behind the uncommon word; or 3 characters in front of the uncommon word and 4 characters behind the uncommon word. After the context corresponding to the uncommon word and the uncommon word is determined, the context corresponding to the uncommon word and the uncommon word is added to the original corpus, and the corpus in the original corpus is updated to obtain the target corpus. It should be noted that, in the process of adding the uncommon word and the context corresponding to the uncommon word to the original corpus, the uncommon word and the context corresponding to the uncommon word can be added to the original corpus as a character string to obtain the target corpus.

Further, the training method of the recognition model further comprises:

step a4, inserting the uncommon word into the label word to update the label word in the original corpus to obtain a target corpus.

Furthermore, after the uncommon word is determined, the uncommon word is inserted into the label words in the original corpus so as to update the label words in the original corpus and obtain the target corpus. It should be noted that, in the target corpus, at least one uncommon word exists in each label word. In the process of inserting the uncommon word into the label characters, the insertion position of the uncommon word is not limited, and the uncommon word can be randomly inserted into the label characters. The rarely-used words inserted into different label characters can be the same or different.

And b, acquiring a background image corresponding to the image to be trained, and constructing according to the target corpus and the background image to obtain a simulation image.

After the image to be trained is acquired, the background image corresponding to the image to be trained is acquired, and it can be understood that each image to be trained has a corresponding background image. And after the target corpus is obtained, constructing to obtain a simulation image according to the target corpus inventory and the background image, wherein at least one simulation image exists in each simulation image, namely, the target corpus is selected from the target corpus, and the target corpus is embedded into the background image to obtain the simulation image.

And step S20, determining a training data set according to the image to be trained and the simulation image.

And after the simulation image is obtained, determining a training data set according to the image to be trained and the simulation image, wherein the training data set contains the image to be trained and the simulation image. In this embodiment, the number of images to be trained and the number of simulation images in the training data set are not specifically limited, and for example, 80 images to be trained and 20 simulation images may exist; or 45 images to be trained and 55 simulation images.

And step S30, training a generation countermeasure network and a recognition network in a preset neural network model to obtain a recognition model based on the training data set.

And after the training data set is obtained, training a generated countermeasure network and a recognition network in a preset neural network model to obtain a recognition model based on the training data set. The Neural Network model may be DFF (Deep Feed forward Neural Network), RNN (Recurrent Neural Network), LSTM (Long short term Memory Network), or the like.

Further, step S30 includes:

and c, fixing a generated countermeasure network in a preset neural network model, and optimizing a recognition network in the neural network model by adopting a gradient descent algorithm based on the training data set, so that the generated countermeasure network judges hidden layer data obtained by the simulation image in the training data set through the generated countermeasure network as real data, wherein the generated countermeasure network is a branch of the recognition network.

Further, after the training data set is obtained, the images to be trained and the simulation images in the training data set are mixed and input into the neural network model, specifically, the images to be trained and the simulation images in the training data set can be mixed and input into the neural network model in a batch mode, that is, a plurality of images to be trained and simulation images are input at one time, and the embodiment does not specifically limit the number of the images to be trained and the number of the simulation images input each time. It should be noted that the image to be trained and the simulation image carry corresponding type labels before being input into the neural network model, and it is possible to determine which of the data input into the neural network model is the image to be trained and which is the simulation image. The present embodiment does not specifically limit the expression form of the type tag.

After the image to be trained and the simulation image are input into the Neural network model, the image to be trained and the simulation image pass through a feature extraction layer of the Neural network model to obtain corresponding feature data, and the feature data passes through a full connection layer corresponding to the hidden layer to obtain corresponding hidden layer data, wherein the feature extraction layer can be a ResNet (residual Neural network) network structure or a VGG (oxygen formed Visual Geometry group) network structure and the like. In this embodiment, there are specifically 3 types of hidden layer data, and in other embodiments, there may also be 4 types of hidden layer data or 2 types of hidden layer data. For example, for a face image, the first type of hidden layer data is data related to eyes, the second type of hidden layer data is data related to a nose, and the third type of hidden layer data is data related to a mouth. It can be understood that different hidden layer data represent different positions of the simulation image and the image to be trained, or feature data with different dimensions, such as different scales, and in this embodiment, the feature data is represented in a matrix form.

In the embodiment, the hidden layer data obtained after the simulation image and the image to be trained pass through the feature extraction layer of the neural network model is concerned, so that the distribution of the image to be trained and the simulation image is distinguished at different scales and different levels.

After hidden layer data are obtained, a confrontation network is generated in a fixed neural network model, after the confrontation network is fixedly generated, a recognition network in the neural network model is optimized and optimized by adopting a gradient descent algorithm based on a training data set, so that the generated confrontation network judges a simulation image in the training data set as real data through the hidden layer data obtained after the confrontation network is generated, wherein the generated confrontation network is a branch of the recognition network, and the simulation image is judged as an image to be trained. Gradient descent is one of iterative methods, and can be used to solve a least squares problem (both linear and non-linear), and the gradient descent algorithm in this embodiment is a random gradient descent algorithm. In the embodiment, the loss of the images to be trained and the simulated images for identifying ctc (connected Temporal classification) is reduced through a random gradient descent algorithm, and the MMD (Maximum mean difference) represented by the images to be trained and the simulated images in various hidden layers in a batch is reduced, so that the generated countermeasure network judges the simulated images in the training data set as real data, and the binary cross entropy is reduced according to the method. It should be noted that, the MMD represents not the difference between one image to be trained and one simulation image, but represents the difference between all images to be trained and all simulation images, that is, represents the difference between the whole sample of the image to be trained and the whole sample of the simulation image.

Specifically, the MMD calculation formula may be expressed as:

wherein, F represents a function space, and X represents an image to be trained in the batch, namely real data in the batch; y represents a simulation image in the batch, that is, represents simulation data in the batch, and k represents a kernel function, specifically, in this embodiment, k may be a gaussian kernel function; m is the number of images to be trained in one batch; n is the number of simulation images in one batch; x is the number of_iRepresenting the optimized hidden layer data, x, corresponding to the ith image to be trained_jRepresenting the optimized hidden layer data corresponding to the jth image to be trained; y is_iRepresenting the optimized hidden layer data y corresponding to the ith simulation image_jAnd representing that the jth simulation image corresponds to the optimized hidden layer data.

It should be noted that, by the MMD, the real data and the simulation data in the neural network model are distributed as uniformly as possible, and the simulation data is determined as the real data by the fixed generation countermeasure network, that is, the simulation image is determined as the image to be trained, and further the real data and the simulation data in the neural network model are distributed uniformly, so that the classifier in the neural network model cannot distinguish the real data from the simulation data. In this embodiment, the generation countermeasure network is a branch of the identification network, and the identification network is a backbone with respect to the generation countermeasure network. It is understood that the parameters of the generation of the countermeasure network are not changed when the generation of the countermeasure network is fixed.

And d, fixing the identification network, judging the simulation images in the training data set as simulation data by the generation countermeasure network by adopting a gradient descent algorithm, and judging the images to be trained as real data to obtain an identification model through training.

After the generated countermeasure network can judge the simulation image in the training data set as real data, the generated countermeasure network is not fixed, the identification network is fixed, the generated countermeasure network judges the simulation image in the training data set as simulation data by adopting a gradient descent algorithm, the image to be trained is judged as real data, a discriminator with stronger identification capability in the branch of the generated countermeasure network is obtained, and the identification model is obtained through the training of the steps. It should be noted that the gradient descent algorithm used after the network is fixedly identified is a random gradient descent algorithm, and the random gradient descent algorithm may be the same as the gradient descent algorithm in step c, or may be different from the gradient descent algorithm in step c. It should be noted that, compared to step c and step d, step e improves the discrimination capability of the classifier in the neural network model, and if the simulation data in step c and step d can be identified as real data in each hidden layer corresponding representation, it indicates that the distribution of the image to be trained and the simulation image in the neural network model is closer, that is, the distribution of the simulation data and the real data in the neural network model is closer.

It can be understood that the embodiment reduces the distribution difference between the simulation image and the image to be trained in the neural network model, so that the knowledge learned from the simulation image can be migrated, thereby better improving the recognition accuracy of the recognition model; in addition, the generated countermeasure network is embedded into the recognition network as a branch for countermeasure learning, compared with a two-stage method of training a generated countermeasure network to generate data and inputting the generated data into a new recognition network to retrain the recognition network, the method in the embodiment is simpler, namely, the training difficulty of the recognition model is reduced in the embodiment.

Furthermore, as the real data and the simulation data in the neural network model are distributed very closely, the difference between the real data and the simulation data in the neural network model is reduced, and the condition that the recognition accuracy of the recognition model is reduced when the recognition model is obtained by training the simulation data in the neural network model is reduced; furthermore, the rarely-used word exists in the simulation image, so that the character recognition model obtained by training has the capability of recognizing the rarely-used word, although the rarely-used word is simulated, the difference between the simulation image and the image to be trained in the neural network model is very small, namely the difference between the effect of the simulated rarely-used word in the neural network model and the effect of the real rarely-used word is very small, and the simulated rarely-used word also plays a role of the real rarely-used word.

Specifically, referring to fig. 2, fig. 2 is a schematic flow chart of obtaining a recognition model according to an embodiment of the present invention. Firstly, generating a simulation picture A (simulation image) based on a specific rule through an acquired character label containing uncommon words, then obtaining feature data through a feature extraction layer of a recognition network in a neural network model through the simulation picture A and a real picture B (image to be trained), and then obtaining corresponding hidden layer data through full connection (fc1, fc2) of the corresponding hidden layers. It should be noted that, in fig. 2, two sets of hidden layer data are shown, a hidden layer a and a hidden layer B, and a hidden layer a1 and a hidden layer B1 are obtained by fully connecting the same hidden layer, but corresponding processing procedures are different, the hidden layer data corresponding to the hidden layer a1 is input into the corresponding discriminator 1 to obtain a corresponding loss value (loss), and the hidden layer B1 obtains a corresponding MMD. The processing procedures of the hidden layer a2 and the hidden layer A3 are similar to the processing procedure of the hidden layer a1, and are not repeated herein, and the processing procedures of the hidden layer B2 and the hidden layer B2 are similar to the processing procedure of the hidden layer B1, and are not repeated herein. And obtaining the CTC loss after obtaining the hidden layer data. It should be noted that, if an identification model is to be obtained, it is necessary to make the loss, CTC loss, and MMD corresponding to the discriminator smaller than the corresponding preset values, and if one of the values of the loss, CTC loss, and MMD corresponding to the discriminator is not smaller than the corresponding preset value, the iterative training of the neural network model is continued, so that the loss, CTC loss, and MMD corresponding to the discriminator are smaller than the corresponding preset values, where the present embodiment does not limit the sizes of the preset values corresponding to the loss, CTC loss, and MMD corresponding to the discriminator.

In the embodiment, the recognition model is obtained by obtaining an image to be trained, constructing a simulation image according to the image to be trained, determining a training data set according to the image to be trained and the simulation image, and training a generation countermeasure network and a recognition network in a preset neural network model based on the training data set. The training data set is obtained by constructing the simulation image through the acquired image, the situation that the recognition accuracy of the recognition model obtained through training is low due to the fact that the sample of the training data set is insufficient is avoided, and namely the recognition accuracy of the recognition model obtained through training is improved.

Further, a second embodiment of the training method of the recognition model of the present invention is provided. The second embodiment of the training method of the recognition model differs from the first embodiment of the training method of the recognition model in that step b comprises:

step b1, obtaining a background image corresponding to no character in the image to be trained, and obtaining a target character string corresponding to the background image in the target corpus.

After the image to be trained is obtained, a background image which is not corresponding to characters in the image to be trained is obtained, and a target character string which is corresponding to the background image is obtained in a target language database. It should be noted that, in order to ensure the similarity between the constructed simulation image and the image to be trained, the obtained target character string corresponds to the image to be trained, that is, the obtained target character string contains the label characters corresponding to the image to be trained.

Step b2, determining the simulation font of the label character in the simulation image according to the character font corresponding to the target character string.

After the target character string is determined, the simulation font of the label characters in the simulation image is determined according to the character font corresponding to the target character string, specifically, the character font corresponding to the target character string is the target character string, the font of the label characters corresponding to the image to be trained is determined, and if the font of the label characters corresponding to the image to be trained is a regular font, the simulation font is also determined to be a regular font. In the process of determining the simulation font according to the character font corresponding to the target character string, if the font which is the same as the character font corresponding to the target character string exists, the simulation font is the same as the character font corresponding to the target character string; and if the font which is the same as the character font corresponding to the target character string does not exist, selecting the font with the maximum similarity to the character font corresponding to the target character string from all fonts, and determining the font as the simulation font. Further, if there is no font identical to the font of the character corresponding to the target character string, a font similar to the font of the character corresponding to the target character string may be randomly selected and determined as the emulation font, where the font similar to the font of the character corresponding to the target character string is a font with a similarity between the font and the font of the character corresponding to the target character string greater than a specific similarity, and the specific similarity may be set according to specific needs, and the specific similarity is not specifically limited in this embodiment.

Step b3, embedding the target character string into the corresponding background image in the form of the simulated font to construct a simulated image.

And after the simulation font, the target character string and the background image are determined, embedding the target character string into the corresponding background image in the form of the simulation font so as to construct and obtain the simulation image. Furthermore, the size of the simulation image is the same as that of the corresponding image to be trained, and the position of the target character string in the simulation image should be consistent with the position of the label character in the image to be trained as much as possible in the process of embedding the target character string in the corresponding background image. It will be appreciated that the target text string is simply the label text of the simulated image.

Further, the step of embedding the target text string in the background image in the form of the simulated font to construct a simulated image includes:

step b31, embedding the target character string into the background image in the form of the simulated font to obtain an initial image.

And b32, carrying out noise processing on the initial image to construct a simulation image.

Further, embedding the target character string into a background image in the form of a simulated font to obtain an initial image, performing noise addition processing on the initial image, namely adding noise into the initial image to fuzzify the initial image to construct and obtain a simulated image, specifically, in the process of performing noise addition processing on the initial image, the degree of the added noise is determined by the definition of the corresponding image to be trained, and the definition of the simulated image is approximate to or the same as that of the corresponding image to be trained through the noise addition processing.

In the embodiment, through the process of constructing the simulation image, the font of the label characters in the simulation image is the same as or similar to the font of the label characters in the corresponding image to be trained, and/or the noise condition of the simulation image is the same as or similar to the noise condition of the image to be trained, so that the processing processes of the simulation image and the image to be trained in the neural network model are ensured to be the same as much as possible, and the identification accuracy of the obtained identification model is improved.

Further, a third embodiment of the training method of the recognition model of the present invention is provided. The third embodiment of the training method of the recognition model is different from the first and/or second embodiment of the training method of the recognition model in that the recognition model is an Optical Character Recognition (OCR) recognition model, and the OCR recognition model is one of character recognition models, and referring to FIG. 3, the training method of the recognition model further comprises:

and step S40, after receiving the image to be recognized, inputting the image to be recognized into the OCR recognition model.

And step S50, determining characters in the image to be recognized according to the output result of the OCR recognition model.

In the present embodiment, the recognition model is an OCR recognition model. And after receiving the image to be recognized, inputting the image to be recognized into the OCR recognition model to obtain an output result of the OCR recognition model, and determining characters in the image to be recognized according to the output result. It should be noted that in the process of training the recognition model, the rarely-used words exist in the constructed simulated image, that is, the rarely-used words exist in the labeled characters of the simulated image, so that when the rarely-used words exist in the image to be recognized, the rarely-used words in the image to be recognized can be recognized through the OCR model. And the output result of the OCR recognition model is the specific characters in the image to be recognized.

The embodiment inputs the image to be recognized to the OCR recognition model after receiving the image to be recognized, so that characters in the image to be recognized are accurately recognized through the OCR recognition model.

In addition, the present invention provides a training apparatus for recognition models, and referring to fig. 4, the training apparatus for recognition models includes:

an obtaining module 10, configured to obtain an image to be trained;

a constructing module 20, configured to construct a simulation image according to the image to be trained;

a determining module 30, configured to determine a training data set according to the image to be trained and the simulation image;

and the training module 40 is used for training a generated confrontation network and a recognition network in a preset neural network model to obtain a recognition model based on the training data set.

Further, the recognition model is a character recognition model, and the constructing module 20 includes:

the acquisition unit is used for acquiring the label characters corresponding to the image to be trained;

the generating unit is used for generating a target language library containing uncommon words according to the label characters;

the acquisition unit is further used for acquiring a background image corresponding to the image to be trained;

and the construction unit is used for constructing and obtaining a simulation image according to the target corpus and the background image.

Further, the generation unit includes:

the construction subunit is used for constructing an original corpus according to the label characters;

the first determining subunit is used for determining the uncommon word corresponding to the original corpus;

the first acquiring subunit is used for acquiring a context corresponding to the uncommon word;

and the adding subunit is used for adding the uncommon word and the context corresponding to the uncommon word into the original corpus to obtain a target corpus.

Further, the construction module 20 further includes:

and the inserting unit is used for inserting the uncommon word into the label words so as to update the label words in the original corpus and obtain a target corpus.

Further, the obtaining unit is further configured to obtain a background image in the image to be trained, where the background image does not correspond to the text;

the construction unit includes:

the second acquiring subunit is configured to acquire, in the target corpus, a target text string corresponding to the background image;

the second determining subunit is configured to determine, according to the character font corresponding to the target character string, a simulation font of the tagged character in the simulation image;

and the embedding subunit is used for embedding the target character string into the corresponding background image in the form of the simulated font so as to construct a simulated image.

Further, the embedding subunit is further configured to embed the target text string into the background image in the form of the simulated font, so as to obtain an initial image; and carrying out noise processing on the initial image to construct and obtain a simulation image.

Further, the training module 40 includes:

the fixed unit is used for fixing the generation countermeasure network in the preset neural network model;

the optimization unit is used for optimizing a recognition network in the neural network model by adopting a gradient descent algorithm based on the training data set, so that a generated countermeasure network judges hidden layer data obtained by a simulation image in the training data set through the generated countermeasure network as real data, wherein the generated countermeasure network is a branch of the recognition network;

the fixing unit is also used for fixing the identification network;

the judging unit is further used for judging the simulation images in the training data set into simulation data by the generation countermeasure network by adopting a gradient descent algorithm, and judging the images to be trained into real data so as to train and obtain the recognition model.

Further, the training device for identifying the model further comprises:

the input module is used for inputting the image to be recognized into the OCR recognition model after receiving the image to be recognized;

the determining module 30 is further configured to determine the text in the image to be recognized according to the output result of the OCR recognition model.

The specific implementation of the training device for the recognition model of the present invention is basically the same as that of each embodiment of the above training method for the recognition model, and is not described herein again.

In addition, the invention also provides training equipment for identifying the model. As shown in fig. 5, fig. 5 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention.

It should be noted that fig. 5 is a schematic structural diagram of a hardware operating environment of a training apparatus for recognizing a model. The training device for identifying the model in the embodiment of the invention can be a terminal device such as a PC, a portable computer and the like.

As shown in fig. 5, the training apparatus for recognizing a model may include: a processor 1001, such as a CPU, a memory 1005, a user interface 1003, a network interface 1004, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

It will be appreciated by those skilled in the art that the training apparatus configuration of the recognition model shown in FIG. 5 does not constitute a limitation of the training apparatus of the recognition model, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 5, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a training program of a recognition model. Among them, the operating system is a program that manages and controls the hardware and software resources of the training device of the recognition model, a training program that supports the recognition model, and the execution of other software or programs.

In the training device for the recognition model shown in fig. 5, the user interface 1003 is mainly used for connecting a terminal device and performing data communication with the terminal device, such as receiving an image to be recognized or an image to be trained sent by the terminal device; the network interface 1004 is mainly used for the background server and performs data communication with the background server; the processor 1001 may be configured to invoke a training program of the recognition model stored in the memory 1005 and perform the steps of the training method of the recognition model as described above.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a training program of a recognition model is stored on the computer-readable storage medium, and when executed by a processor, the training program of the recognition model implements the steps of the training method of the recognition model as described above.

The specific implementation manner of the computer-readable storage medium of the present invention is substantially the same as that of each embodiment of the above-mentioned training method for the recognition model, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A training method for recognition models is characterized by comprising the following steps:

2. The training method of the recognition model according to claim 1, wherein the recognition model is a character recognition model, and the step of constructing the simulation image according to the image to be trained comprises:

3. The method for training the recognition model according to claim 2, wherein the step of obtaining the label words corresponding to the image to be trained and generating the target corpus containing the uncommon words according to the label words comprises:

4. The method for training recognition models according to claim 2, wherein the step of determining the uncommon word corresponding to the original corpus further comprises:

5. The method for training the recognition model according to claim 2, wherein the step of obtaining the background image corresponding to the image to be trained and constructing and obtaining the simulation image according to the target corpus and the background image comprises:

6. The method for training a recognition model according to claim 5, wherein the step of embedding the target text string in the form of the simulated font into the corresponding background image to construct a simulated image comprises:

7. The training method of the recognition model according to claim 1, wherein the step of obtaining the recognition model through the generation countermeasure network and the recognition network training in the preset neural network model based on the training data set comprises:

8. The method for training a recognition model according to any one of claims 1 to 7, wherein the recognition model is an Optical Character Recognition (OCR) recognition model, and the step of training the generated countermeasure network and the recognition network in the preset neural network model based on the training data set further comprises:

9. A training apparatus for recognizing a model, comprising:

the acquisition module is used for acquiring an image to be trained;

10. A training apparatus for recognition models, characterized in that the training apparatus for recognition models comprises a memory, a processor and a training program for recognition models stored on the memory and executable on the processor, the training program for recognition models, when executed by the processor, implementing the steps of the training method for recognition models as claimed in any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a training program of a recognition model, which when executed by a processor implements the steps of the training method of a recognition model according to any one of claims 1 to 8.